Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 13 items • Updated 8 days ago • 143
🧠 Traditional Chinese Reasoning Datasets Collection A curated collection of datasets designed to evaluate and train reasoning capabilities in Traditional Chinese across various domains. • 3 items • Updated 8 days ago • 5
LiveCC Collection Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025) • 8 items • Updated 17 days ago • 4
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated 9 days ago • 36
Llama Nemotron Collection Open, Production-ready Enterprise Models • 6 items • Updated 1 day ago • 51
Physical AI Collection Collection of commercial-grade datasets for physical AI developers • 10 items • Updated 4 days ago • 50
DRAMA Collection A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages. • 3 items • Updated Feb 26 • 7
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions Paper • 2502.13124 • Published Feb 18 • 6
OpenR1-Math Collection Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated Mar 11 • 8
Llasa Collection TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated Feb 21 • 18
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated 9 days ago • 108
VideoLLaMA3 Collection Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated Mar 11 • 14
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated Mar 25 • 60