--- datasets: - unicamp-dl/mmarco library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity license: mit widget: [] base_model: - BAAI/bge-m3 --- # BGE-m3 RU mMARCO/v2 Native Queries This is a [BGE-M3](https://huggingface.co/BAAI/bge-m3) model post-trained on the Russian dataset from MMARCO/v2. The model was used for the SIGIR 2025 Short paper: Lost in Transliteration: Bridging the Script Gap in Neural IR. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 1024 tokens - **Similarity Function:** Cosine Similarity ## Training Details ### Framework Versions - Python: 3.10.13 - Sentence Transformers: 3.1.1 - Transformers: 4.45.1 - PyTorch: 2.4.1 - Accelerate: 0.34.2 - Datasets: 3.0.1 - Tokenizers: 0.20.3