InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 1 day ago • 172
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published 9 days ago • 95
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published 6 days ago • 19
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 4 days ago • 101
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 14 days ago • 72
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments Paper • 2504.03886 • Published 11 days ago • 9
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 8 days ago • 158
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published 15 days ago • 15
Challenges and Paths Towards AI for Software Engineering Paper • 2503.22625 • Published 18 days ago • 3
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging Paper • 2503.22236 • Published 19 days ago • 11
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation Paper • 2503.20672 • Published 20 days ago • 13