view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 265
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Paper • 2412.01292 • Published Dec 2, 2024 • 13
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 101
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published Sep 10, 2024 • 15
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold Paper • 2408.14608 • Published Aug 26, 2024 • 8
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published Aug 29, 2024 • 29
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published Aug 29, 2024 • 33
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 51
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21, 2024 • 23
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA By ybelkada and 4 others • May 24, 2023 • 153
view article Article Google releases Gemma 2 2B, ShieldGemma and Gemma Scope By Xenova and 3 others • Jul 31, 2024 • 59