--- license: mit datasets: - unicamp-dl/mmarco language: - zh base_model: - unicamp-dl/mt5-base-mmarco-v2 --- # mt5-base Reranker ZH mMARCO/v2 Transliterated Queries tokenised with Anserini This is a variation of Unicamp's [mt5-base Reranker](https://huggingface.co/unicamp-dl/mt5-base-mmarco-v2) initially finetuned on mMARCOv/2. The queries are transliterated from Chinese to English text using [uroman](https://github.com/isi-nlp/uroman). The queries were tokenised with [pyterrier_anserini](https://github.com/seanmacavaney/pyterrier-anserini/tree/main/pyterrier_anserini). The model was used for the SIGIR 2025 Short paper: Lost in Transliteration: Bridging the Script Gap in Neural IR.