SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.
Community
As large language models (LLMs) increasingly support internet-based search capabilities, the potential to access up-to-date medical information in real time has greatly expanded. However, we have observed that LLMs are not inherently adept at utilizing search engines effectively. If a model is only allowed to rewrite the original query once, the resulting search queries are often of poor quality, leading to suboptimal retrieval results that fail to meet the demands of professional medical inquiries. This raises a critical research question: how can we enable LLMs to use search engines more effectively?
Given the high cost of retraining LLMs, we adopt a more cost-efficient and scalable approach: inference-time scaling. In this approach, we generate a large number of queries under high-temperature sampling conditions. By leveraging the diversity of these generated queries, we increase the likelihood that at least some will be of high quality. To ensure that only the most informative queries are used, we introduce an uncertainty-based evaluation mechanism that measures the impact of retrieved information on the model's confidence. This allows us to filter out low-quality queries and retain only those that significantly reduce the model’s predictive uncertainty, ensuring that the retrieved information is both accurate and relevant.
Our experiments further validate this approach: as the number of search queries increases from 0 to 32, the overall performance of the retrieval-augmented generation (RAG) framework steadily improves, aligning well with our initial hypothesis. Through this method, we not only compensate for the limitations of LLMs in utilizing search engines but also achieve significant performance improvements without the need for additional model training.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge (2025)
- Vendi-RAG: Adaptively Trading-Off Diversity And Quality Significantly Improves Retrieval Augmented Generation With LLMs (2025)
- Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications (2025)
- Enhancing Retrieval-Augmented Generation: A Study of Best Practices (2025)
- MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation (2025)
- K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected Compressor (2025)
- Federated Retrieval Augmented Generation for Multi-Product Question Answering (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper