Update README.md
Browse files
README.md
CHANGED
@@ -37,14 +37,14 @@ The model is designed for **semantic search**, **retrieval**, and **sentence sim
|
|
37 |
|
38 |
---
|
39 |
|
40 |
-
|
41 |
The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
|
42 |
-
|
43 |
|
44 |
- Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
|
45 |
-
- Student embeddings were trained to minimize
|
46 |
-
- A projection layer (768→1024) was added to match the
|
47 |
-
- **No prefixes (like “query:” or “passage:”)
|
48 |
|
49 |
---
|
50 |
|
@@ -84,37 +84,70 @@ To achieve this:
|
|
84 |
|
85 |
## 📊 Evaluation Results
|
86 |
|
87 |
-
The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
---
|
90 |
|
91 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
|
93 |
-
|
94 |
-
|
95 |
-
|
|
|
|
|
|
|
|
|
96 |
|
97 |
---
|
98 |
|
99 |
-
### 🔹
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
| | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
|
106 |
-
| **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
|
107 |
-
| | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** |
|
108 |
-
| | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
|
109 |
|
110 |
---
|
111 |
|
112 |
### 🔹 Conclusions
|
113 |
|
114 |
-
-
|
115 |
-
-
|
116 |
-
-
|
117 |
-
-
|
|
|
118 |
|
119 |
---
|
120 |
|
@@ -134,4 +167,4 @@ The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the *
|
|
134 |
from sentence_transformers import SentenceTransformer
|
135 |
|
136 |
model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
|
137 |
-
embeddings = model.encode(["Hello world", "Привет мир"],
|
|
|
37 |
|
38 |
---
|
39 |
|
40 |
+
## About Distillation
|
41 |
The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
|
42 |
+
Key points:
|
43 |
|
44 |
- Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
|
45 |
+
- Student embeddings were trained to minimize **MSE** with the teacher’s embeddings.
|
46 |
+
- A projection layer (768→1024) was added to match the teacher’s embedding dimensionality.
|
47 |
+
- **No prefixes** (like “query:” or “passage:”) were used — the student encodes sentences directly.
|
48 |
|
49 |
---
|
50 |
|
|
|
84 |
|
85 |
## 📊 Evaluation Results
|
86 |
|
87 |
+
The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets, as well as on **EN (MS MARCO)** and **RU (SberQuad)** benchmarks.
|
88 |
+
|
89 |
+
### 🔹 TL;DR Summary
|
90 |
+
|
91 |
+
- The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **high fidelity**.
|
92 |
+
- The **original E5-base** embeddings are **incompatible** with the teacher’s space (cosine ≈ 0).
|
93 |
+
- Recall@1: **EN ≈ 86% (Student)** vs **87.7% (Teacher)**
|
94 |
+
- Recall@1: **RU ≈ 65.2% (Student)** vs **59.9% (Teacher)** — student even outperforms teacher on Russian.
|
95 |
|
96 |
---
|
97 |
|
98 |
+
## 📊 Consolidated Evaluation Results
|
99 |
+
|
100 |
+
This table combines **main validation/test metrics** with **additional EN/RU benchmarks**.
|
101 |
+
Note: EN/RU Benchmarks are external datasets used to test retrieval performance; they are **not part of the training/validation splits**.
|
102 |
+
|
103 |
+
| Dataset / Split | Model | MSE | Cosine mean / Cosine_Pos | Cosine std / Cosine_Pos_std | Cosine_Neg / Cosine_Neg | Cosine_Neg_std | MRR | Recall@1 | Recall@5 | Recall@10 |
|
104 |
+
|--------------------|--------------------|----------|-------------------------|-----------------------------|------------------------|----------------|--------|----------|----------|-----------|
|
105 |
+
| **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | — | — | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
|
106 |
+
| | Student (E5-distilled) | 0.000288 | 0.8389 | 0.0498 | — | — | 0.9158 | 0.8607 | 0.9829 | 0.9955 |
|
107 |
+
| | e5-base (original) | 0.001866 | -0.0042 | 0.0297 | — | — | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
|
108 |
+
| **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | — | — | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
|
109 |
+
| | Student (E5-distilled) | 0.000276 | 0.8462 | 0.0425 | — | — | 0.9176 | 0.8608 | 0.9896 | 0.9956 |
|
110 |
+
| | e5-base (original) | 0.001867 | -0.0027 | 0.0293 | — | — | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
|
111 |
+
| **EN Benchmark** (MS MARCO) | Teacher (BGE-M3) | — | 0.6710 | 0.0724 | 0.5575 | 0.0676 | 0.6362 | 0.4385 | 0.9205 | 1.0000 |
|
112 |
+
| | Student (E5-distilled) | — | 0.7233 | 0.0670 | 0.6269 | 0.0615 | 0.5912 | 0.3745 | 0.9130 | 1.0000 |
|
113 |
+
| | e5-base (original) | — | 0.8886 | 0.0259 | 0.8427 | 0.0264 | 0.6852 | 0.5100 | 0.9380 | 1.0000 |
|
114 |
+
| **RU Benchmark** (SberQuad) | Teacher (BGE-M3) | — | 0.6070 | 0.0871 | 0.5790 | 0.1140 | 0.7995 | 0.5990 | 1.0000 | 1.0000 |
|
115 |
+
| | Student (E5-distilled) | — | 0.6716 | 0.0777 | 0.6435 | 0.1016 | 0.8263 | 0.6525 | 1.0000 | 1.0000 |
|
116 |
+
| | e5-base (original) | — | 0.8467 | 0.0323 | 0.8412 | 0.0426 | 0.7458 | 0.4915 | 1.0000 | 1.0000 |
|
117 |
|
118 |
+
---
|
119 |
+
|
120 |
+
### ✅ Summary (EN)
|
121 |
+
- Student closely reproduces the teacher’s embedding space.
|
122 |
+
- Slight drop in Recall@1 (−6.4 p.p.), but Recall@10 remains perfect.
|
123 |
+
- Student embeddings are **more compact**, with slightly higher cosine similarities.
|
124 |
+
- Original e5-base is incompatible with teacher’s space.
|
125 |
|
126 |
---
|
127 |
|
128 |
+
### 🔹 Russian Benchmark — SberQuad
|
129 |
+
|
130 |
+
| Model | Recall@1 | Recall@5 | Recall@10 | Cosine_Pos | Cosine_Pos_std | Cosine_Neg | Cosine_Neg_std | MRR |
|
131 |
+
|------------------------|----------|----------|-----------|------------|----------------|------------|----------------|-------|
|
132 |
+
| Teacher (USER-BGE-M3) | 0.5990 | 1.0000 | 1.0000 | 0.6070 | 0.0871 | 0.5790 | 0.1140 | 0.7995 |
|
133 |
+
| Student (E5-distilled) | 0.6525 | 1.0000 | 1.0000 | 0.6716 | 0.0777 | 0.6435 | 0.1016 | 0.8263 |
|
134 |
+
| multilingual-e5-base | 0.4915 | 1.0000 | 1.0000 | 0.8467 | 0.0323 | 0.8412 | 0.0426 | 0.7458 |
|
135 |
+
| **Δ Student–Teacher** | +0.0535 | 0.0000 | 0.0000 | +0.0646 | −0.0093 | +0.0645 | −0.0125 | +0.0268 |
|
136 |
|
137 |
+
### ✅ Summary (RU)
|
138 |
+
- Student surpasses teacher on Recall@1 (+5.35 p.p.) and MRR (+2.7 p.p.).
|
139 |
+
- Embedding space is more semantically consistent (higher cosine values).
|
140 |
+
- Base e5 remains incompatible, as expected.
|
|
|
|
|
|
|
|
|
141 |
|
142 |
---
|
143 |
|
144 |
### 🔹 Conclusions
|
145 |
|
146 |
+
- 🧩 **Distillation succeeded:** the student replicates the teacher’s embedding space closely.
|
147 |
+
- 🇷🇺 On Russian, student **outperforms teacher** — better Recall@1 and MRR.
|
148 |
+
- 🇬🇧 On English, student performance is nearly identical, with minimal Recall@1 drop.
|
149 |
+
- ⚡ Base e5 embeddings remain incompatible with teacher space.
|
150 |
+
- 🔄 **Final Result:** a bilingual, lightweight student preserving teacher quality, without prefix requirements.
|
151 |
|
152 |
---
|
153 |
|
|
|
167 |
from sentence_transformers import SentenceTransformer
|
168 |
|
169 |
model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
|
170 |
+
embeddings = model.encode(["Hello world", "Привет мир"], normalize
|