--- license: apache-2.0 base_model: - intfloat/multilingual-e5-base language: - ru - en tags: - sentence-embeddings - semantic-search - distillation - student-model - multilingual --- [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-model-blue)](https://huggingface.co/skatzR/USER-BGE-M3-E5-Base-Distilled) # 🧩 Student-Distilled Sentence Embeddings β€” Deepvk/USER-bge-m3 β†’ intfloat/multilingual-e5-base ✨ This repository contains a **student model distilled from [`Deepvk/USER-BGE-M3`](https://huggingface.co/deepvk/USER-bge-m3)** using [`intfloat/multilingual-e5-base`](https://huggingface.co/intfloat/multilingual-e5-base) as the base encoder. The model is designed for **semantic search**, **retrieval**, and **sentence similarity** tasks in **Russian πŸ‡·πŸ‡Ί** and **English πŸ‡¬πŸ‡§**, optimized for **practical use without prefixes**. --- # πŸ” Model Card | Property | Value | |--------------------|----------------------------------------------------------------------| | **Teacher Model** | [`Deepvk/USER-BGE-M3`](https://huggingface.co/deepvk/USER-bge-m3) | | **Base Model** | [`intfloat/multilingual-e5-base`](https://huggingface.co/intfloat/multilingual-e5-base) | | **Distillation Type** | Embedding-level distillation (teacher β†’ student) | | **Embedding Dim** | 1024 | | **Projection** | Dense layer (768 β†’ 1024) | | **Loss Function** | Mean Squared Error (MSE) | | **Libraries** | `sentence-transformers`, `torch` | | **License** | Apache-2.0 | | **Hardware** | CPU & GPU supported | --- **About Distillation:** The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5. To achieve this: - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`. - Student embeddings were trained to minimize the **MSE** with the teacher’s embeddings. - A projection layer (768β†’1024) was added to match the dimensionality of the teacher model. - **No prefixes (like β€œquery:” or β€œpassage:”)** were used β€” the student encodes sentences directly. --- ## πŸš€ Features - ⚑ **Fast inference** β€” optimized E5-base architecture with no prefix processing - 🧠 **High-quality semantic understanding** β€” inherits BGE’s retrieval capability - 🌍 **Multilingual (RU/EN)** β€” strong in Russian, solid in English - πŸ”„ **Teacher-compatible** β€” embeddings align closely with Deepvk/USER-BGE-M3 - πŸ›  **Sentence-transformers ready** β€” plug-and-play for semantic search, clustering, and retrieval --- ## 🧠 Intended Use **βœ… Recommended for:** - Semantic search and retrieval systems - Text embedding and similarity pipelines - Multilingual tasks focused on Russian and English - Clustering and topic discovery **❌ Not ideal for:** - Prefix-based retrieval setups (e.g., original E5 behavior) - Cross-encoder scoring tasks --- ## πŸ“š Training Details - **Training Objective:** Mimic teacher embeddings (Deepvk/USER-BGE-M3) - **Dataset Composition:** Retrieval/Semantic ratio = 60/40 - **Language Distribution:** Russian / English β‰ˆ 80 / 20 - **Training Duration:** 5 epochs with warmup and cosine evaluation - **Optimizer:** AdamW with automatic mixed precision (AMP) --- ## πŸ“Š Evaluation Results The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets. --- ### πŸ”Ή TL;DR - The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **very high fidelity**. - The **original E5-base** embeddings are **incompatible** with the BGE space (cosine β‰ˆ 0). - **Recall@1: 86% (Student)** vs **87.7% (Teacher)** β€” nearly identical retrieval performance. --- ### πŸ”Ή Main Metrics | Split | Model | MSE | Cosine mean | Cosine std | MRR | Recall@1 | Recall@5 | Recall@10 | |--------------|--------------------|----------:|-------------:|------------:|--------:|----------:|----------:|----------:| | **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 | | | **Student (E5-distilled)** | **0.000288** | **0.8389** | **0.0498** | **0.9158** | **0.8607** | **0.9829** | **0.9955** | | | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 | | **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 | | | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** | | | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 | --- ### πŸ”Ή Conclusions - βœ… **Student β‰ˆ Teacher** β€” the distilled model learned the teacher’s semantic space almost perfectly. - ❌ **Original E5 β‰  Teacher** β€” default E5 embeddings are unrelated to BGE’s space. - πŸ“ˆ **Stable generalization** β€” validation and test results match closely. - 🧩 The new student is a **drop-in BGE-compatible encoder**, with **no prefix requirement**. --- ## πŸ“‚ Model Structure - `USER-BGE-M3-E5-Base-Distilled` β€” trained model folder containing: - Transformer encoder (`intfloat/multilingual-e5-base`) - Pooling layer - Dense projection layer (768 β†’ 1024) - Fully compatible with `sentence-transformers` API. --- ## 🧩 Using the Model ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled") embeddings = model.encode(["Hello world", "ΠŸΡ€ΠΈΠ²Π΅Ρ‚ ΠΌΠΈΡ€"], normalize_embeddings=True)