Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Abstract
This survey examines innovative architectures for large language models to enhance efficiency, covering linear and sparse sequence modeling, efficient attention mechanisms, sparse mixture-of-experts, hybrid models, and diffusion LLMs.
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties. However, the traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment. In this survey, we offer a systematic examination of innovative LLM architectures that address the inherent limitations of transformers and boost the efficiency. Starting from language modeling, this survey covers the background and technical details of linear and sparse sequence modeling methods, efficient full attention variants, sparse mixture-of-experts, hybrid model architectures incorporating the above techniques, and emerging diffusion LLMs. Additionally, we discuss applications of these techniques to other modalities and consider their wider implications for developing scalable, resource-aware foundation models. By grouping recent studies into the above category, this survey presents a blueprint of modern efficient LLM architectures, and we hope this could help motivate future research toward more efficient, versatile AI systems.
Community
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Efficient Attention Mechanisms for Large Language Models: A Survey (2025)
- Mixture of Experts in Large Language Models (2025)
- Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation (2025)
- VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo (2025)
- A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges (2025)
- A Survey of Context Engineering for Large Language Models (2025)
- Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper