view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 242
OpenVLA: An Open-Source Vision-Language-Action Model Paper • 2406.09246 • Published Jun 13, 2024 • 42
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • Jul 9 • 668
view article Article 🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other • Jun 21 • 68
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 300
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Paper • 2412.01292 • Published Dec 2, 2024 • 13
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 102
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published Sep 10, 2024 • 15
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold Paper • 2408.14608 • Published Aug 26, 2024 • 8
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published Aug 29, 2024 • 29
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published Aug 29, 2024 • 33
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 52
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21, 2024 • 23
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA By ybelkada and 4 others • May 24, 2023 • 163
view article Article Google releases Gemma 2 2B, ShieldGemma and Gemma Scope By Xenova and 3 others • Jul 31, 2024 • 60