Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 89
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12, 2024 • 66
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20, 2024 • 26
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming Paper • 2311.07689 • Published Nov 13, 2023 • 8
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 29