FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages By davanstrien and 5 others • 1 day ago • 23
LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment By dvilasuero • 1 day ago • 16
We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️ By pollen-robotics and 2 others • 1 day ago • 15
Why We Built the OpenMDW License: A Comprehensive License for ML Models By linuxfoundation • 7 days ago • 14
Should We Still Pretrain Encoders with Masked Language Modeling? By Nicolas-BZRD and 3 others • 7 days ago • 19
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 181
Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset By wangrongsheng • 14 days ago • 8
🅰️ℹ️ 1️⃣0️⃣1️⃣ **What is HtmlRAG, Multimodal RAG and Agentic RAG?** By Kseniase and 1 other • Jan 9 • 13
Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models By tiiuae and 8 others • 5 days ago • 6
Can AI Be Consentful? Rethinking Permission in the Age of Synthetic Everything By giadap • 1 day ago • 5
Accelerating AI for Drug Discovery: Ginkgo’s GDPx Functional Genomics and GDPa Antibody Developability Dataset Series By cgeorgiaw and 1 other • 15 days ago • 15
FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages By davanstrien and 5 others • 1 day ago • 23
LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment By dvilasuero • 1 day ago • 16
We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️ By pollen-robotics and 2 others • 1 day ago • 15
Why We Built the OpenMDW License: A Comprehensive License for ML Models By linuxfoundation • 7 days ago • 14
Should We Still Pretrain Encoders with Masked Language Modeling? By Nicolas-BZRD and 3 others • 7 days ago • 19
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 181
Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset By wangrongsheng • 14 days ago • 8
🅰️ℹ️ 1️⃣0️⃣1️⃣ **What is HtmlRAG, Multimodal RAG and Agentic RAG?** By Kseniase and 1 other • Jan 9 • 13
Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models By tiiuae and 8 others • 5 days ago • 6
Can AI Be Consentful? Rethinking Permission in the Age of Synthetic Everything By giadap • 1 day ago • 5
Accelerating AI for Drug Discovery: Ginkgo’s GDPx Functional Genomics and GDPa Antibody Developability Dataset Series By cgeorgiaw and 1 other • 15 days ago • 15