view article Article Make your ZeroGPU Spaces go brrr with PyTorch ahead-of-time compilation By cbensimon and 3 others • 7 days ago • 44
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Paper • 2508.16790 • Published 17 days ago • 7
view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • Aug 7 • 78
mEUltilingual speechLLM projectors Collection Multilingual projectors trained with SLAM-ASR for EU languages. • 1 item • Updated Jul 10 • 5
Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning Paper • 2506.02627 • Published Jun 3 • 2
view article Article How Much Power does a SOTA Open Video Model Use? ⚡🎥 By jdelavande and 2 others • Jul 2 • 15
view article Article Gemma 3n fully available in the open-source ecosystem! By ariG23498 and 7 others • Jun 26 • 116
view article Article Common Pitfalls in Sharing Open Source Models on Hugging Face (and How to Dodge Them) By FriendliAI and 2 others • Jul 1 • 21
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 131
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models Paper • 2505.19223 • Published May 25 • 8
This Time is Different: An Observability Perspective on Time Series Foundation Models Paper • 2505.14766 • Published May 20 • 40
view article Article NVIDIA Cosmos Now Available On Hugging Face For Physical AI Reasoning By PranjaliJoshi and 1 other • May 19 • 26
The Audio-Visual BatVision Dataset for Research on Sight and Sound Paper • 2303.07257 • Published Mar 13, 2023 • 1
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields Paper • 2405.18213 • Published May 28, 2024 • 1
view article Article Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models. By tiiuae and 9 others • May 15 • 36
view article Article Blazingly fast whisper transcriptions with Inference Endpoints By mfuntowicz and 5 others • May 13 • 75