Papers
arxiv:2502.13233

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

Published on Feb 18
· Submitted by YuchengShi on Feb 20
Authors:
,
,
,
,

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.

Community

Paper author Paper submitter

As large language models (LLMs) increasingly support internet-based search capabilities, the potential to access up-to-date medical information in real time has greatly expanded. However, we have observed that LLMs are not inherently adept at utilizing search engines effectively. If a model is only allowed to rewrite the original query once, the resulting search queries are often of poor quality, leading to suboptimal retrieval results that fail to meet the demands of professional medical inquiries. This raises a critical research question: how can we enable LLMs to use search engines more effectively?

Given the high cost of retraining LLMs, we adopt a more cost-efficient and scalable approach: inference-time scaling. In this approach, we generate a large number of queries under high-temperature sampling conditions. By leveraging the diversity of these generated queries, we increase the likelihood that at least some will be of high quality. To ensure that only the most informative queries are used, we introduce an uncertainty-based evaluation mechanism that measures the impact of retrieved information on the model's confidence. This allows us to filter out low-quality queries and retain only those that significantly reduce the model’s predictive uncertainty, ensuring that the retrieved information is both accurate and relevant.

Our experiments further validate this approach: as the number of search queries increases from 0 to 32, the overall performance of the retrieval-augmented generation (RAG) framework steadily improves, aligning well with our initial hypothesis. Through this method, we not only compensate for the limitations of LLMs in utilizing search engines but also achieve significant performance improvements without the need for additional model training.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.13233 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.13233 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.13233 in a Space README.md to link it from this page.

Collections including this paper 3