Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper
• 2404.02258 • Published
• 107
Note This paper shows that LLMs should not be used as libraries! They are not knowledge engines, but rather (feeble) reasoning engines.