Teuken-7B-v0.6 Collection OpenGPT-X Teuken 7B models trained on 6 trillion tokens • 1 item • Updated Dec 10, 2024
Teuken-7B-v0.55 Collection OpenGPT-X Teuken 7B models trained on 5.5 trillion tokens • 3 items • Updated Dec 3, 2024
Teuken-7B-v0.5 Collection OpenGPT-X Teuken 7B models trained on 5 trillion tokens • 4 items • Updated Dec 9, 2024
Teuken-7B-v0.4 Collection OpenGPT-X Teuken 7B models trained on 4 trillion tokens • 4 items • Updated Dec 6, 2024 • 5
MixtureVitae study models and datasets Collection Collection of models and dataset related to MixtureVitae, open and fully reproducible pretraining dataset built from permissive sources • 16 items • Updated Feb 13 • 1
open-sci-ref-0.01 Collection Research baseline models trained on various open reference datasets. ArXiv: https://arxiv.org/abs/2509.09009 • 8 items • Updated 14 days ago • 4
ChemPile Collection The ChemPile is a dataset with over 77 billion curated multimodal tokens about chemistry. For more information, visit https://chempile.lamalab.org/. • 8 items • Updated Oct 23, 2025 • 16