OmniCorpus Collection A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text • 6 items • Updated Apr 20 • 2
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46
A Simple Aerial Detection Baseline of Multimodal Language Models Paper • 2501.09720 • Published Jan 16 • 2
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 3
view article Article Preference Optimization for Vision Language Models By qgallouedec and 3 others • Jul 10, 2024 • 79
OmniCorpus 🐳 Collection [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text https://github.com/OpenGVLab/OmniCorpus • 5 items • Updated May 14 • 1
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 69
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 31
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 60
InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language Paper • 2305.05662 • Published May 9, 2023 • 4