Papers
arxiv:2506.10674

TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

Published on Jun 12
· Submitted by vincolle on Jun 13
Authors:
,
,
,
,

Abstract

A benchmark dataset called TeleMath evaluates Large Language Models in domain-specific mathematical problems within telecommunications, showing that models designed for mathematical reasoning perform better than general-purpose models.

AI-generated summary

The increasing adoption of artificial intelligence in telecommunications has raised interest in the capability of Large Language Models (LLMs) to address domain-specific, mathematically intensive tasks. Although recent advancements have improved the performance of LLMs in general mathematical reasoning, their effectiveness within specialized domains, such as signal processing, network optimization, and performance analysis, remains largely unexplored. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. Comprising 500 question-answer (QnA) pairs, TeleMath covers a wide spectrum of topics in the telecommunications field. This paper outlines the proposed QnAs generation pipeline, starting from a selected seed of problems crafted by Subject Matter Experts. The evaluation of a wide range of open-source LLMs reveals that best performance on TeleMath is achieved by recent models explicitly designed for mathematical or logical reasoning. In contrast, general-purpose models, even those with a large number of parameters, often struggle with these challenges. We have released the dataset and the evaluation code to ease result reproducibility and support future research.

Community

Paper author Paper submitter

We present TeleMath, a novel benchmark dataset designed to evaluate the mathematical reasoning capabilities of LLMs within the telecommunications domain. At its core is a domain-agnostic synthetic data generation framework that expands a small seed dataset into 500 diverse and challenging problems. This flexible pipeline is easily adaptable to other fields, promoting broader research into specialized AI capabilities. TeleMath is publicly released to encourage further advancements in telecom-specific LLMs.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.10674 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.10674 in a Space README.md to link it from this page.

Collections including this paper 1