view article Article Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications By nmmursit and 5 others • 10 days ago • 26
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • 28 days ago • 72
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • Aug 5 • 490
SauerkrautLM-Multilingual-(Reason)-ColBERT Collection SauerkrautLM ColBERT is a suite of Late-Interaction retrieval models built with PyLate’s ColBERT architecture and tuned for seven European languages. • 7 items • Updated Aug 3 • 18
view article Article Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub By drbh and 6 others • Jun 12 • 133
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • Jul 29 • 170
GLiCLass-V3 Collection Models for zero-shot text classification that are up to 50 times faster than Cross-Encoders and show the same or higher accuracy. • 8 items • Updated 26 days ago • 15
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 62
Reward Models Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated 6 days ago • 21
view article Article Introducing ColQwen-Omni: Retrieve in every modality By manu and 4 others • Jul 17 • 69
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages By davanstrien and 5 others • Jul 8 • 30
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 647
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1 • 78
view article Article Should We Still Pretrain Encoders with Masked Language Modeling? By Nicolas-BZRD and 3 others • Jul 2 • 21
MaLA corpus Collection MaLA Corpus for Massive Language Adaptation of Large Language Models https://mala-lm.github.io • 18 items • Updated Jun 9 • 7
view article Article 🥬 LettuceDetect Goes Multilingual: Fine-tuning EuroBERT on Synthetic Translations By adaamko and 1 other • May 19 • 9
Multilingual Hallucination Detection Collection These are our EuroBERT fine-tunes on our translated RAGTruth datasets. • 13 items • Updated May 18 • 5
Llama Nemotron Collection Open, Production-ready Enterprise Models • 11 items • Updated 6 days ago • 68