InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 276
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 26
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 83
WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation Paper • 2505.01490 • Published May 2 • 5
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.27k
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Paper • 2410.05255 • Published Oct 7, 2024 • 5
DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks Paper • 2307.05628 • Published Jul 11, 2023 • 10
Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering Paper • 2402.00827 • Published Feb 1, 2024 • 2
LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 31
DreamReward: Text-to-3D Generation with Human Preference Paper • 2403.14613 • Published Mar 21, 2024 • 38