Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Apr 30 • 88
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Apr 30 • 76
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30, 2024 • 49
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 22
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published Oct 7, 2024 • 46
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5, 2024 • 62