skatzR
/

USER-BGE-M3-E5-Base-Distilled

@@ -37,14 +37,14 @@ The model is designed for **semantic search**, **retrieval**, and **sentence sim
 ---
-**About Distillation:**
 The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
-To achieve this:
 - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
-- Student embeddings were trained to minimize the **MSE** with the teacher’s embeddings.
-- A projection layer (768→1024) was added to match the dimensionality of the teacher model.
-- **No prefixes (like “query:” or “passage:”)** were used — the student encodes sentences directly.
 ---
@@ -84,37 +84,70 @@ To achieve this:
 ## 📊 Evaluation Results
-The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets.
 ---
-### 🔹 TL;DR
-- The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **very high fidelity**.
-- The **original E5-base** embeddings are **incompatible** with the BGE space (cosine ≈ 0).
-- **Recall@1: 86% (Student)** vs **87.7% (Teacher)** — nearly identical retrieval performance.
 ---
-### 🔹 Main Metrics
-| Split       | Model               | MSE      | Cosine mean | Cosine std | MRR    | Recall@1 | Recall@5 | Recall@10 |
-|--------------|--------------------|----------:|-------------:|------------:|--------:|----------:|----------:|----------:|
-| **Validation** | Teacher (BGE-M3)   | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
-|               | **Student (E5-distilled)** | **0.000288** | **0.8389** | **0.0498** | **0.9158** | **0.8607** | **0.9829** | **0.9955** |
-|               | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
-| **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
-|               | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** |
-|               | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
 ---
 ### 🔹 Conclusions
-- ✅ **Student ≈ Teacher** — the distilled model learned the teacher’s semantic space almost perfectly.
-- ❌ **Original E5 ≠ Teacher** — default E5 embeddings are unrelated to BGE’s space.
-- 📈 **Stable generalization** — validation and test results match closely.
-- 🧩 The new student is a **drop-in BGE-compatible encoder**, with **no prefix requirement**.
 ---
@@ -134,4 +167,4 @@ The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the *
 from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
-embeddings = model.encode(["Hello world", "Привет мир"], normalize_embeddings=True)

 ---
+## About Distillation
 The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
+Key points:
 - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
+- Student embeddings were trained to minimize **MSE** with the teacher’s embeddings.
+- A projection layer (768→1024) was added to match the teacher’s embedding dimensionality.
+- **No prefixes** (like “query:” or “passage:”) were used — the student encodes sentences directly.
 ---
 ## 📊 Evaluation Results
+The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets, as well as on **EN (MS MARCO)** and **RU (SberQuad)** benchmarks.
+### 🔹 TL;DR Summary
+- The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **high fidelity**.
+- The **original E5-base** embeddings are **incompatible** with the teacher’s space (cosine ≈ 0).
+- Recall@1: **EN ≈ 86% (Student)** vs **87.7% (Teacher)**
+- Recall@1: **RU ≈ 65.2% (Student)** vs **59.9% (Teacher)** — student even outperforms teacher on Russian.
 ---
+## 📊 Consolidated Evaluation Results
+This table combines **main validation/test metrics** with **additional EN/RU benchmarks**.
+Note: EN/RU Benchmarks are external datasets used to test retrieval performance; they are **not part of the training/validation splits**.
+| Dataset / Split     | Model               | MSE      | Cosine mean / Cosine_Pos | Cosine std / Cosine_Pos_std | Cosine_Neg / Cosine_Neg | Cosine_Neg_std | MRR    | Recall@1 | Recall@5 | Recall@10 |
+|--------------------|--------------------|----------|-------------------------|-----------------------------|------------------------|----------------|--------|----------|----------|-----------|
+| **Validation**     | Teacher (BGE-M3)   | 0.000000 | 1.0000                  | 0.0000                      | —                      | —              | 0.9244 | 0.8746   | 0.9851   | 0.9966    |
+|                    | Student (E5-distilled) | 0.000288 | 0.8389                  | 0.0498                      | —                      | —              | 0.9158 | 0.8607   | 0.9829   | 0.9955    |
+|                    | e5-base (original) | 0.001866 | -0.0042                 | 0.0297                      | —                      | —              | 0.0003 | 0.0000   | 0.0002   | 0.0003    |
+| **Test**           | Teacher (BGE-M3)   | 0.000000 | 1.0000                  | 0.0000                      | —                      | —              | 0.9273 | 0.8771   | 0.9908   | 0.9962    |
+|                    | Student (E5-distilled) | 0.000276 | 0.8462                  | 0.0425                      | —                      | —              | 0.9176 | 0.8608   | 0.9896   | 0.9956    |
+|                    | e5-base (original) | 0.001867 | -0.0027                 | 0.0293                      | —                      | —              | 0.0002 | 0.0000   | 0.0001   | 0.0002    |
+| **EN Benchmark** (MS MARCO) | Teacher (BGE-M3) | —        | 0.6710                  | 0.0724                      | 0.5575                 | 0.0676         | 0.6362 | 0.4385   | 0.9205   | 1.0000    |
+|                    | Student (E5-distilled) | —        | 0.7233                  | 0.0670                      | 0.6269                 | 0.0615         | 0.5912 | 0.3745   | 0.9130   | 1.0000    |
+|                    | e5-base (original) | —        | 0.8886                  | 0.0259                      | 0.8427                 | 0.0264         | 0.6852 | 0.5100   | 0.9380   | 1.0000    |
+| **RU Benchmark** (SberQuad) | Teacher (BGE-M3) | —        | 0.6070                  | 0.0871                      | 0.5790                 | 0.1140         | 0.7995 | 0.5990   | 1.0000   | 1.0000    |
+|                    | Student (E5-distilled) | —        | 0.6716                  | 0.0777                      | 0.6435                 | 0.1016         | 0.8263 | 0.6525   | 1.0000   | 1.0000    |
+|                    | e5-base (original) | —        | 0.8467                  | 0.0323                      | 0.8412                 | 0.0426         | 0.7458 | 0.4915   | 1.0000   | 1.0000    |
+---
+### ✅ Summary (EN)
+- Student closely reproduces the teacher’s embedding space.
+- Slight drop in Recall@1 (−6.4 p.p.), but Recall@10 remains perfect.
+- Student embeddings are **more compact**, with slightly higher cosine similarities.
+- Original e5-base is incompatible with teacher’s space.
 ---
+### 🔹 Russian Benchmark — SberQuad
+| Model                  | Recall@1 | Recall@5 | Recall@10 | Cosine_Pos | Cosine_Pos_std | Cosine_Neg | Cosine_Neg_std | MRR   |
+|------------------------|----------|----------|-----------|------------|----------------|------------|----------------|-------|
+| Teacher (USER-BGE-M3)  | 0.5990   | 1.0000   | 1.0000    | 0.6070     | 0.0871         | 0.5790     | 0.1140         | 0.7995 |
+| Student (E5-distilled) | 0.6525   | 1.0000   | 1.0000    | 0.6716     | 0.0777         | 0.6435     | 0.1016         | 0.8263 |
+| multilingual-e5-base   | 0.4915   | 1.0000   | 1.0000    | 0.8467     | 0.0323         | 0.8412     | 0.0426         | 0.7458 |
+| **Δ Student–Teacher**  | +0.0535  | 0.0000   | 0.0000    | +0.0646    | −0.0093        | +0.0645    | −0.0125        | +0.0268 |
+### ✅ Summary (RU)
+- Student surpasses teacher on Recall@1 (+5.35 p.p.) and MRR (+2.7 p.p.).
+- Embedding space is more semantically consistent (higher cosine values).
+- Base e5 remains incompatible, as expected.
 ---
 ### 🔹 Conclusions
+- 🧩 **Distillation succeeded:** the student replicates the teacher’s embedding space closely.
+- 🇷🇺 On Russian, student **outperforms teacher** — better Recall@1 and MRR.
+- 🇬🇧 On English, student performance is nearly identical, with minimal Recall@1 drop.
+- ⚡ Base e5 embeddings remain incompatible with teacher space.
+- 🔄 **Final Result:** a bilingual, lightweight student preserving teacher quality, without prefix requirements.
 ---
 from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
+embeddings = model.encode(["Hello world", "Привет мир"], normalize