metadata
license: mit
datasets:
- unicamp-dl/mmarco
language:
- zh
base_model:
- unicamp-dl/mt5-base-mmarco-v2
mt5-base Reranker ZH mMARCO/v2 50/50 Native Transliterated Queries
This is a variation of Unicamp's mt5-base Reranker initially finetuned on mMARCOv/2.
The queries are a 50/50 split between native Chinese and transliterated Chinese to English text using uroman.
The model was used for the SIGIR 2025 Short paper: Lost in Transliteration: Bridging the Script Gap in Neural IR.