Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
Abstract
The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when producing KG elements like Uniform Resource Identifiers (URIs) based on internal parametric knowledge. This often results in content that appears plausible but is factually incorrect, posing significant challenges for their use in real-world information retrieval (IR) applications. This has led to increased research aimed at detecting and mitigating such errors. In this paper, we introduce PGMR (Post-Generation Memory Retrieval), a modular framework that incorporates a non-parametric memory module to retrieve KG elements and enhance LLM-based SPARQL query generation. Our experimental results indicate that PGMR consistently delivers strong performance across diverse datasets, data distributions, and LLMs. Notably, PGMR significantly mitigates URI hallucinations, nearly eliminating the problem in several scenarios.
Community
LLMs are powerful for generating SPARQL queries from natural language—but they have a major flaw: Hallucinations. They often fabricate Uniform Resource Identifiers (URIs)—the unique IDs that reference entities and relationships in knowledge graphs—leading to incorrect or unusable queries.
We introduce PGMR (Post-Generation Memory Retrieval), a novel approach that nearly eliminates such hallucinations by separating query structure generation from knowledge retrieval. Instead of trusting an LLM’s internal memory, PGMR retrieves the correct URIs post-generation, ensuring factual accuracy and dramatically improving query reliability.
Our results? Near-complete elimination of URI hallucinations across multiple datasets, LLM architectures, and data distributions.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2025)
- SUGAR: Leveraging Contextual Confidence for Smarter Retrieval (2025)
- A Proposed Large Language Model-Based Smart Search for Archive System (2025)
- To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation (2025)
- Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs (2025)
- Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP Documents (2025)
- RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper