๐ LLM pretraining datasets Collection A collection of datasets for LLM pretraining โข 9 items โข Updated May 5 โข 9
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper โข 2411.04996 โข Published Nov 7, 2024 โข 52