TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
Abstract
A benchmark dataset called TeleMath evaluates Large Language Models in domain-specific mathematical problems within telecommunications, showing that models designed for mathematical reasoning perform better than general-purpose models.
The increasing adoption of artificial intelligence in telecommunications has raised interest in the capability of Large Language Models (LLMs) to address domain-specific, mathematically intensive tasks. Although recent advancements have improved the performance of LLMs in general mathematical reasoning, their effectiveness within specialized domains, such as signal processing, network optimization, and performance analysis, remains largely unexplored. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. Comprising 500 question-answer (QnA) pairs, TeleMath covers a wide spectrum of topics in the telecommunications field. This paper outlines the proposed QnAs generation pipeline, starting from a selected seed of problems crafted by Subject Matter Experts. The evaluation of a wide range of open-source LLMs reveals that best performance on TeleMath is achieved by recent models explicitly designed for mathematical or logical reasoning. In contrast, general-purpose models, even those with a large number of parameters, often struggle with these challenges. We have released the dataset and the evaluation code to ease result reproducibility and support future research.
Community
We present TeleMath, a novel benchmark dataset designed to evaluate the mathematical reasoning capabilities of LLMs within the telecommunications domain. At its core is a domain-agnostic synthetic data generation framework that expands a small seed dataset into 500 diverse and challenging problems. This flexible pipeline is easily adaptable to other fields, promoting broader research into specialized AI capabilities. TeleMath is publicly released to encourage further advancements in telecom-specific LLMs.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems (2025)
- ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark (2025)
- Evaluation of LLMs for mathematical problem solving (2025)
- VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models (2025)
- EasyMath: A 0-shot Math Benchmark for SLMs (2025)
- CoRT: Code-integrated Reasoning within Thinking (2025)
- DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper