Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 4 days ago • 23
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 430
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Apr 30 • 83
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Apr 28 • 618
view article Article MTEB Leaderboard : User guide and best practices By lyon-nlp-group • Mar 13, 2024 • 9
view article Article 🦙⚗️ Using Llama3 and distilabel to build fine-tuning datasets By dvilasuero • Jun 4, 2024 • 79
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30, 2024 • 38