FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 63
MobileCLIP2: Improving Multi-Modal Reinforced Training Paper • 2508.20691 • Published 12 days ago • 3
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published 13 days ago • 58
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 523
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality By saurabhdash and 3 others • Mar 4 • 75
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 147
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Paper • 2502.13233 • Published Feb 18 • 15
Baichuan-M1: Pushing the Medical Capability of Large Language Models Paper • 2502.12671 • Published Feb 18 • 1
Scaling Test-Time Compute Without Verification or RL is Suboptimal Paper • 2502.12118 • Published Feb 17 • 1
Is Noise Conditioning Necessary for Denoising Generative Models? Paper • 2502.13129 • Published Feb 18 • 1
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario Paper • 2501.10132 • Published Jan 17 • 22