Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published 15 days ago • 26
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 116
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 108
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • Jun 2 • 24
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding Paper • 2502.11492 • Published Feb 17 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 16
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29, 2024 • 347
view article Article Preference Optimization for Vision Language Models By qgallouedec and 3 others • Jul 10, 2024 • 79
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis Paper • 2505.09358 • Published May 14 • 25
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • May 15 • 115
view article Article Synthetic data: save money, time and carbon with open source By MoritzLaurer • Feb 16, 2024 • 76
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 469
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Paper • 2504.05506 • Published Apr 7 • 23
view article Article Optimise AI Models and Make Them Faster, Smaller, Cheaper, Greener By PrunaAI and 2 others • Apr 4 • 18