view article Article Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM By INSAIT-Institute and 2 others • Apr 23 • 55
Preference Datasets for DPO Collection This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Dec 11, 2024 • 43
view article Article Finding Moroccan Arabic (Darija) in Fineweb 2 By omarkamali and 3 others • Dec 8, 2024 • 23
view article Article The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare By aaditya and 2 others • Apr 19, 2024 • 173
Arabic Aya DPO Datasets Collection Our synthetic DPO datasets for Arabic Aya. • 5 items • Updated Jun 4, 2024 • 4
Tokenization Falling Short: The Curse of Tokenization Paper • 2406.11687 • Published Jun 17, 2024 • 16
CroissantLLM: A Truly Bilingual French-English Language Model Paper • 2402.00786 • Published Feb 1, 2024 • 27
view article Article 🥐CroissantLLM: A Truly Bilingual French-English Language Model By manu • Feb 5, 2024 • 14
FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 7
view article Article Introducing the Open Arabic LLM Leaderboard By alielfilali01 and 4 others • May 14, 2024 • 96
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper • 2306.05685 • Published Jun 9, 2023 • 36
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B By asoria and 3 others • Apr 4, 2024 • 28