Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Abstract
A causal representation learning framework identifies a concise causal structure to explain performance variations in language models across benchmarks by controlling for base model variations.
Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training (2025)
- Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance (2025)
- Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study (2025)
- Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks (2025)
- Growing Through Experience: Scaling Episodic Grounding in Language Models (2025)
- Resa: Transparent Reasoning Models via SAEs (2025)
- CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper