---
license: apache-2.0
base_model:
- intfloat/multilingual-e5-base
language:
- ru
- en
tags:
- sentence-embeddings
- semantic-search
- distillation
- student-model
- multilingual
---

[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-model-blue)](https://huggingface.co/skatzR/USER-BGE-M3-E5-Base-Distilled)
# 🧩 Student-Distilled Sentence Embeddings — Deepvk/USER-bge-m3 → intfloat/multilingual-e5-base

✨ This repository contains a **student model distilled from [`Deepvk/USER-BGE-M3`](https://huggingface.co/deepvk/USER-bge-m3)** using [`intfloat/multilingual-e5-base`](https://huggingface.co/intfloat/multilingual-e5-base) as the base encoder.  
The model is designed for **semantic search**, **retrieval**, and **sentence similarity** tasks in **Russian 🇷🇺** and **English 🇬🇧**, optimized for **practical use without prefixes**.

---

# 🔍 Model Card

| Property           | Value                                                                 |
|--------------------|----------------------------------------------------------------------|
| **Teacher Model**  | [`Deepvk/USER-BGE-M3`](https://huggingface.co/deepvk/USER-bge-m3)   |
| **Base Model**     | [`intfloat/multilingual-e5-base`](https://huggingface.co/intfloat/multilingual-e5-base) |
| **Distillation Type** | Embedding-level distillation (teacher → student) |
| **Embedding Dim**  | 1024 |
| **Projection**     | Dense layer (768 → 1024) |
| **Loss Function**  | Mean Squared Error (MSE) |
| **Libraries**      | `sentence-transformers`, `torch` |
| **License**        | Apache-2.0 |
| **Hardware**       | CPU & GPU supported |

---

**About Distillation:**  
The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.  
To achieve this:

- Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.  
- Student embeddings were trained to minimize the **MSE** with the teacher’s embeddings.  
- A projection layer (768→1024) was added to match the dimensionality of the teacher model.  
- **No prefixes (like “query:” or “passage:”)** were used — the student encodes sentences directly.

---

## 🚀 Features

- ⚡ **Fast inference** — optimized E5-base architecture with no prefix processing  
- 🧠 **High-quality semantic understanding** — inherits BGE’s retrieval capability  
- 🌍 **Multilingual (RU/EN)** — strong in Russian, solid in English  
- 🔄 **Teacher-compatible** — embeddings align closely with Deepvk/USER-BGE-M3  
- 🛠 **Sentence-transformers ready** — plug-and-play for semantic search, clustering, and retrieval

---

## 🧠 Intended Use

**✅ Recommended for:**
- Semantic search and retrieval systems  
- Text embedding and similarity pipelines  
- Multilingual tasks focused on Russian and English  
- Clustering and topic discovery  

**❌ Not ideal for:**
- Prefix-based retrieval setups (e.g., original E5 behavior)
- Cross-encoder scoring tasks  

---

## 📚 Training Details

- **Training Objective:** Mimic teacher embeddings (Deepvk/USER-BGE-M3)  
- **Dataset Composition:** Retrieval/Semantic ratio = 60/40  
- **Language Distribution:** Russian / English ≈ 80 / 20  
- **Training Duration:** 5 epochs with warmup and cosine evaluation  
- **Optimizer:** AdamW with automatic mixed precision (AMP)

---

## 📊 Evaluation Results

The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets.

---

### 🔹 TL;DR

- The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **very high fidelity**.  
- The **original E5-base** embeddings are **incompatible** with the BGE space (cosine ≈ 0).  
- **Recall@1: 86% (Student)** vs **87.7% (Teacher)** — nearly identical retrieval performance.  

---

### 🔹 Main Metrics

| Split       | Model               | MSE      | Cosine mean | Cosine std | MRR    | Recall@1 | Recall@5 | Recall@10 |
|--------------|--------------------|----------:|-------------:|------------:|--------:|----------:|----------:|----------:|
| **Validation** | Teacher (BGE-M3)   | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
|               | **Student (E5-distilled)** | **0.000288** | **0.8389** | **0.0498** | **0.9158** | **0.8607** | **0.9829** | **0.9955** |
|               | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
| **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
|               | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** |
|               | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |

---

### 🔹 Conclusions

- ✅ **Student ≈ Teacher** — the distilled model learned the teacher’s semantic space almost perfectly.  
- ❌ **Original E5 ≠ Teacher** — default E5 embeddings are unrelated to BGE’s space.  
- 📈 **Stable generalization** — validation and test results match closely.  
- 🧩 The new student is a **drop-in BGE-compatible encoder**, with **no prefix requirement**.

---

## 📂 Model Structure

- `USER-BGE-M3-E5-Base-Distilled` — trained model folder containing:  
  - Transformer encoder (`intfloat/multilingual-e5-base`)  
  - Pooling layer  
  - Dense projection layer (768 → 1024)  
- Fully compatible with `sentence-transformers` API.

---

## 🧩 Using the Model

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
embeddings = model.encode(["Hello world", "Привет мир"], normalize_embeddings=True)