ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Abstract
We investigate the logical reasoning capabilities of large language models (LLMs) and their scalability in complex non-monotonic reasoning. To this end, we introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance on logic grid puzzles derived from constraint satisfaction problems (CSPs). ZebraLogic enables the generation of puzzles with controllable and quantifiable complexity, facilitating a systematic study of the scaling limits of models such as Llama, o1 models, and DeepSeek-R1. By encompassing a broad range of search space complexities and diverse logical constraints, ZebraLogic provides a structured environment to evaluate reasoning under increasing difficulty. Our results reveal a significant decline in accuracy as problem complexity grows -- a phenomenon we term the curse of complexity. This limitation persists even with larger models and increased inference-time computation, suggesting inherent constraints in current LLM reasoning capabilities. Additionally, we explore strategies to enhance logical reasoning, including Best-of-N sampling, backtracking mechanisms, and self-verification prompts. Our findings offer critical insights into the scalability of LLM reasoning, highlight fundamental limitations, and outline potential directions for improvement.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning (2025)
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning (2024)
- A NotSo Simple Way to Beat Simple Bench (2024)
- Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers (2025)
- JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models (2025)
- Are Your LLMs Capable of Stable Reasoning? (2024)
- LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper