Pre-trained Large Language Models Learn Hidden Markov Models In-context
Abstract
In-context learning in large language models can effectively model sequences generated by hidden Markov models, achieving predictive accuracy and uncovering scaling trends, thus demonstrating its potential as a diagnostic tool for complex scientific data.
Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)x2013their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the first demonstration that ICL can learn and predict HMM-generated sequencesx2013an advance that deepens our understanding of in-context learning in LLMs and establishes its potential as a powerful tool for uncovering hidden structure in complex scientific data.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models (2025)
- TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation (2025)
- Advanced posterior analyses of hidden Markov models: finite Markov chain imbedding and hybrid decoding (2025)
- RLVR-World: Training World Models with Reinforcement Learning (2025)
- Maximum Likelihood Learning of Latent Dynamics Without Reconstruction (2025)
- Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning (2025)
- Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper