Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction Paper • 2502.17239 • Published Feb 24 • 3
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Paper • 2503.14324 • Published Mar 18 • 2
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Paper • 2503.19470 • Published Mar 25 • 18
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer Paper • 2304.11818 • Published Apr 24, 2023
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning Paper • 2410.12952 • Published Oct 16, 2024