BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12, 2024 • 24
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 37
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Paper • 2407.01942 • Published Jul 2, 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 101
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17, 2024 • 21
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets Paper • 2312.10253 • Published Dec 15, 2023 • 8
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Paper • 2308.06595 • Published Aug 12, 2023 • 6
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 33
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text Paper • 2304.06939 • Published Apr 14, 2023