A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Abstract
A$^2$Search is an annotation-free framework that handles ambiguity in open-domain QA by detecting ambiguous questions, gathering alternative answers, and optimizing with RL, achieving state-of-the-art performance across benchmarks.
Recent advances in Large Language Models (LLMs) and Reinforcement Learning (RL) have led to strong performance in open-domain question answering (QA). However, existing models still struggle with questions that admit multiple valid answers. Standard QA benchmarks, which typically assume a single gold answer, overlook this reality and thus produce inappropriate training signals. Existing attempts to handle ambiguity often rely on costly manual annotation, which is difficult to scale to multi-hop datasets such as HotpotQA and MuSiQue. In this paper, we present A^2Search, an annotation-free, end-to-end training framework to recognize and handle ambiguity. At its core is an automated pipeline that detects ambiguous questions and gathers alternative answers via trajectory sampling and evidence verification. The model is then optimized with RL using a carefully designed AnsF1 reward, which naturally accommodates multiple answers. Experiments on eight open-domain QA benchmarks demonstrate that A^2Search achieves new state-of-the-art performance. With only a single rollout, A^2Search-7B yields an average AnsF1@1 score of 48.4% across four multi-hop benchmarks, outperforming all strong baselines, including the substantially larger ReSearch-32B (46.2%). Extensive analyses further show that A^2Search resolves ambiguity and generalizes across benchmarks, highlighting that embracing ambiguity is essential for building more reliable QA systems. Our code, data, and model weights can be found at https://github.com/zfj1998/A2Search
Community
Thank you for your interest in our paper!
Our work addresses a key limitation in current popular multi- and single-hop question answering (QA) datasets: many questions are inherently ambiguous and admit multiple valid answers, yet these datasets typically provide only a single gold-standard answer. This issue hinders both the training and evaluation of search agents. To tackle this issue, we propose an annotation-free, scalable data construction pipeline that enriches existing QA datasets with alternative valid answers. We then train search agents using reinforcement learning (RL), guided by an Answer F1 (AnsF1) reward, to interact with search tools and actively recognize and navigate answer ambiguity by retrieving alternative answers. Our results are highly promising and underscore a critical insight: ambiguity in QA cannot be overlooked when developing robust question-answering systems.
Our trained models are available at https://huggingface.co/collections/zfj1998/a2search-68e75d8370c5f9219395d0eb
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents (2025)
- QAgent: A modular Search Agent with Interactive Query Understanding (2025)
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL (2025)
- EviNote-RAG: Enhancing RAG Models via Answer-Supportive Evidence Notes (2025)
- HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation (2025)
- Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks (2025)
- AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper