arxiv:2406.17158

DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs

Published on Jun 24, 2024

Authors:

Abstract

A benchmark and toolkit evaluate retrieval and reasoning capabilities of models for complex open-domain QA tasks, highlighting the need for improved retrieval performance.

AI-generated summary

Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning. The complexity of such questions could stem from questions being compositional, hybrid evidence, or ambiguity in questions. While retrieval performance for classical QA tasks is well explored, their capabilities for heterogeneous complex retrieval tasks, especially in an open-domain setting, and the impact on downstream QA performance, are relatively unexplored. To address this, in this work, we propose a benchmark composing diverse complex QA tasks and provide a toolkit to evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting. We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models. In addition, since context-based reasoning is critical for solving complex QA tasks, we also evaluate the reasoning capabilities of LLMs and the impact of retrieval performance on their reasoning capabilities. Through experiments, we observe that much progress is to be made in retrieval for complex QA to improve downstream QA performance. Our software and related data can be accessed at https://github.com/VenkteshV/DEXTER

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.17158 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.17158 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.17158 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.