Teuken-7B-v0.6 Collection OpenGPT-X Teuken 7B models trained on 6 trillion tokens. • 2 items • Updated 16 days ago • 3
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Paper • 2505.22232 • Published May 28 • 18
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models Paper • 2505.22232 • Published May 28 • 18
JQL Collection Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models • 5 items • Updated May 30
Running 5 5 JQL: Judging Quality Across Languages 🦊 Filter multilingual data to improve LLM training
Running 5 5 JQL: Judging Quality Across Languages 🦊 Filter multilingual data to improve LLM training