LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 8 days ago • 55
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 124
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated Dec 13, 2024 • 128
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 50
A Case Study of Web App Coding with OpenAI Reasoning Models Paper • 2409.13773 • Published Sep 19, 2024 • 6
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4, 2024 • 23
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 36
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 40
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 12 days ago • 293
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5, 2024 • 27
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 20 items • Updated 3 days ago • 101
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 98
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization Paper • 2407.14257 • Published Jul 19, 2024 • 5