Nandan Thakur's picture

Nandan Thakur

nthakur

·

https://thakur-nandan.github.io

AI & ML interests

NLP, IR, QA

Recent Activity

upvoted an article 16 days ago

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

published a dataset 16 days ago

nthakur-backup/bge-retrieval-data-msmarco-5K-removed

updated a dataset 20 days ago

rlhn/default-680K-bge-reranker-v2-gemma

View all activity

Organizations

authored 3 papers about 2 months ago

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Paper • 2504.13128 • Published Apr 17 • 5

Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses

Paper • 2504.20006 • Published Apr 28

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Paper • 2505.16967 • Published May 22 • 23

authored a paper 5 months ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 38

authored a paper 8 months ago

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

Paper • 2410.13716 • Published Oct 17, 2024

authored a paper 11 months ago

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Paper • 2406.16828 • Published Jun 24, 2024

authored 9 papers over 1 year ago

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Paper • 2306.07471 • Published Jun 13, 2023

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Paper • 2312.11361 • Published Dec 18, 2023 • 1

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

Paper • 2307.16883 • Published Jul 31, 2023

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Paper • 2010.08240 • Published Oct 16, 2020

Evaluating Embedding APIs for Information Retrieval

Paper • 2305.06300 • Published May 10, 2023 • 1

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Paper • 2112.07577 • Published Dec 14, 2021

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

Paper • 2210.09984 • Published Oct 18, 2022 • 2

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Paper • 2104.08663 • Published Apr 17, 2021 • 3

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

Paper • 2311.05800 • Published Nov 10, 2023 • 4