Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates Paper • 2206.00832 • Published Jun 2, 2022
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms Paper • 2311.13133 • Published Nov 22, 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Paper • 2312.17482 • Published Dec 29, 2023 • 1