Improving large language models with concept-aware fine-tuning
Abstract
Concept-Aware Fine-Tuning (CAFT) enhances large language models by enabling multi-token learning in the fine-tuning phase, leading to improved coherent understanding and better performance across various tasks.
Large language models (LLMs) have become the cornerstone of modern AI. However, the existing paradigm of next-token prediction fundamentally limits their ability to form coherent, high-level concepts, making it a critical barrier to human-like understanding and reasoning. Take the phrase "ribonucleic acid" as an example: an LLM will first decompose it into tokens, i.e., artificial text fragments ("rib", "on", ...), then learn each token sequentially, rather than grasping the phrase as a unified, coherent semantic entity. This fragmented representation hinders deeper conceptual understanding and, ultimately, the development of truly intelligent systems. In response, we introduce Concept-Aware Fine-Tuning (CAFT), a novel multi-token training method that redefines how LLMs are fine-tuned. By enabling the learning of sequences that span multiple tokens, this method fosters stronger concept-aware learning. Our experiments demonstrate significant improvements compared to conventional next-token finetuning methods across diverse tasks, including traditional applications like text summarization and domain-specific ones like de novo protein design. Multi-token prediction was previously only possible in the prohibitively expensive pretraining phase; CAFT, to our knowledge, is the first to bring the multi-token setting to the post-training phase, thus effectively democratizing its benefits for the broader community of practitioners and researchers. Finally, the unexpected effectiveness of our proposed method suggests wider implications for the machine learning research community. All code and data are available at https://github.com/michaelchen-lab/caft-llm
Community
Always wanted to try the multi-token prediction methods from DeepSeek V3 and Meta? CAFT enables it for fine-tuning, which enhances model performance through improved conceptual understanding.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Multi-Token Prediction Needs Registers (2025)
- Learning to Insert [PAUSE] Tokens for Better Reasoning (2025)
- Improving In-Context Learning with Reasoning Distillation (2025)
- zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression (2025)
- Pretraining Language Models to Ponder in Continuous Space (2025)
- Pre-Training Curriculum for Multi-Token Prediction in Language Models (2025)
- GEM: Empowering LLM for both Embedding Generation and Language Understanding (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper