skatzR commited on
Commit
0561347
·
verified ·
1 Parent(s): f25b804

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -24
README.md CHANGED
@@ -37,14 +37,14 @@ The model is designed for **semantic search**, **retrieval**, and **sentence sim
37
 
38
  ---
39
 
40
- **About Distillation:**
41
  The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
42
- To achieve this:
43
 
44
  - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
45
- - Student embeddings were trained to minimize the **MSE** with the teacher’s embeddings.
46
- - A projection layer (768→1024) was added to match the dimensionality of the teacher model.
47
- - **No prefixes (like “query:” or “passage:”)** were used — the student encodes sentences directly.
48
 
49
  ---
50
 
@@ -84,37 +84,70 @@ To achieve this:
84
 
85
  ## 📊 Evaluation Results
86
 
87
- The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets.
 
 
 
 
 
 
 
88
 
89
  ---
90
 
91
- ### 🔹 TL;DR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
- - The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **very high fidelity**.
94
- - The **original E5-base** embeddings are **incompatible** with the BGE space (cosine ≈ 0).
95
- - **Recall@1: 86% (Student)** vs **87.7% (Teacher)** — nearly identical retrieval performance.
 
 
 
 
96
 
97
  ---
98
 
99
- ### 🔹 Main Metrics
 
 
 
 
 
 
 
100
 
101
- | Split | Model | MSE | Cosine mean | Cosine std | MRR | Recall@1 | Recall@5 | Recall@10 |
102
- |--------------|--------------------|----------:|-------------:|------------:|--------:|----------:|----------:|----------:|
103
- | **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
104
- | | **Student (E5-distilled)** | **0.000288** | **0.8389** | **0.0498** | **0.9158** | **0.8607** | **0.9829** | **0.9955** |
105
- | | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
106
- | **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
107
- | | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** |
108
- | | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
109
 
110
  ---
111
 
112
  ### 🔹 Conclusions
113
 
114
- - **Student Teacher** — the distilled model learned the teacher’s semantic space almost perfectly.
115
- - **Original E5 Teacher** — default E5 embeddings are unrelated to BGE’s space.
116
- - 📈 **Stable generalization** validation and test results match closely.
117
- - 🧩 The new student is a **drop-in BGE-compatible encoder**, with **no prefix requirement**.
 
118
 
119
  ---
120
 
@@ -134,4 +167,4 @@ The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the *
134
  from sentence_transformers import SentenceTransformer
135
 
136
  model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
137
- embeddings = model.encode(["Hello world", "Привет мир"], normalize_embeddings=True)
 
37
 
38
  ---
39
 
40
+ ## About Distillation
41
  The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
42
+ Key points:
43
 
44
  - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
45
+ - Student embeddings were trained to minimize **MSE** with the teacher’s embeddings.
46
+ - A projection layer (768→1024) was added to match the teacher’s embedding dimensionality.
47
+ - **No prefixes** (like “query:” or “passage:”) were used — the student encodes sentences directly.
48
 
49
  ---
50
 
 
84
 
85
  ## 📊 Evaluation Results
86
 
87
+ The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets, as well as on **EN (MS MARCO)** and **RU (SberQuad)** benchmarks.
88
+
89
+ ### 🔹 TL;DR Summary
90
+
91
+ - The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **high fidelity**.
92
+ - The **original E5-base** embeddings are **incompatible** with the teacher’s space (cosine ≈ 0).
93
+ - Recall@1: **EN ≈ 86% (Student)** vs **87.7% (Teacher)**
94
+ - Recall@1: **RU ≈ 65.2% (Student)** vs **59.9% (Teacher)** — student even outperforms teacher on Russian.
95
 
96
  ---
97
 
98
+ ## 📊 Consolidated Evaluation Results
99
+
100
+ This table combines **main validation/test metrics** with **additional EN/RU benchmarks**.
101
+ Note: EN/RU Benchmarks are external datasets used to test retrieval performance; they are **not part of the training/validation splits**.
102
+
103
+ | Dataset / Split | Model | MSE | Cosine mean / Cosine_Pos | Cosine std / Cosine_Pos_std | Cosine_Neg / Cosine_Neg | Cosine_Neg_std | MRR | Recall@1 | Recall@5 | Recall@10 |
104
+ |--------------------|--------------------|----------|-------------------------|-----------------------------|------------------------|----------------|--------|----------|----------|-----------|
105
+ | **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | — | — | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
106
+ | | Student (E5-distilled) | 0.000288 | 0.8389 | 0.0498 | — | — | 0.9158 | 0.8607 | 0.9829 | 0.9955 |
107
+ | | e5-base (original) | 0.001866 | -0.0042 | 0.0297 | — | — | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
108
+ | **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | — | — | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
109
+ | | Student (E5-distilled) | 0.000276 | 0.8462 | 0.0425 | — | — | 0.9176 | 0.8608 | 0.9896 | 0.9956 |
110
+ | | e5-base (original) | 0.001867 | -0.0027 | 0.0293 | — | — | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
111
+ | **EN Benchmark** (MS MARCO) | Teacher (BGE-M3) | — | 0.6710 | 0.0724 | 0.5575 | 0.0676 | 0.6362 | 0.4385 | 0.9205 | 1.0000 |
112
+ | | Student (E5-distilled) | — | 0.7233 | 0.0670 | 0.6269 | 0.0615 | 0.5912 | 0.3745 | 0.9130 | 1.0000 |
113
+ | | e5-base (original) | — | 0.8886 | 0.0259 | 0.8427 | 0.0264 | 0.6852 | 0.5100 | 0.9380 | 1.0000 |
114
+ | **RU Benchmark** (SberQuad) | Teacher (BGE-M3) | — | 0.6070 | 0.0871 | 0.5790 | 0.1140 | 0.7995 | 0.5990 | 1.0000 | 1.0000 |
115
+ | | Student (E5-distilled) | — | 0.6716 | 0.0777 | 0.6435 | 0.1016 | 0.8263 | 0.6525 | 1.0000 | 1.0000 |
116
+ | | e5-base (original) | — | 0.8467 | 0.0323 | 0.8412 | 0.0426 | 0.7458 | 0.4915 | 1.0000 | 1.0000 |
117
 
118
+ ---
119
+
120
+ ### Summary (EN)
121
+ - Student closely reproduces the teacher’s embedding space.
122
+ - Slight drop in Recall@1 (−6.4 p.p.), but Recall@10 remains perfect.
123
+ - Student embeddings are **more compact**, with slightly higher cosine similarities.
124
+ - Original e5-base is incompatible with teacher’s space.
125
 
126
  ---
127
 
128
+ ### 🔹 Russian Benchmark — SberQuad
129
+
130
+ | Model | Recall@1 | Recall@5 | Recall@10 | Cosine_Pos | Cosine_Pos_std | Cosine_Neg | Cosine_Neg_std | MRR |
131
+ |------------------------|----------|----------|-----------|------------|----------------|------------|----------------|-------|
132
+ | Teacher (USER-BGE-M3) | 0.5990 | 1.0000 | 1.0000 | 0.6070 | 0.0871 | 0.5790 | 0.1140 | 0.7995 |
133
+ | Student (E5-distilled) | 0.6525 | 1.0000 | 1.0000 | 0.6716 | 0.0777 | 0.6435 | 0.1016 | 0.8263 |
134
+ | multilingual-e5-base | 0.4915 | 1.0000 | 1.0000 | 0.8467 | 0.0323 | 0.8412 | 0.0426 | 0.7458 |
135
+ | **Δ Student–Teacher** | +0.0535 | 0.0000 | 0.0000 | +0.0646 | −0.0093 | +0.0645 | −0.0125 | +0.0268 |
136
 
137
+ ### Summary (RU)
138
+ - Student surpasses teacher on Recall@1 (+5.35 p.p.) and MRR (+2.7 p.p.).
139
+ - Embedding space is more semantically consistent (higher cosine values).
140
+ - Base e5 remains incompatible, as expected.
 
 
 
 
141
 
142
  ---
143
 
144
  ### 🔹 Conclusions
145
 
146
+ - 🧩 **Distillation succeeded:** the student replicates the teacher’s embedding space closely.
147
+ - 🇷🇺 On Russian, student **outperforms teacher** better Recall@1 and MRR.
148
+ - 🇬🇧 On English, student performance is nearly identical, with minimal Recall@1 drop.
149
+ - Base e5 embeddings remain incompatible with teacher space.
150
+ - 🔄 **Final Result:** a bilingual, lightweight student preserving teacher quality, without prefix requirements.
151
 
152
  ---
153
 
 
167
  from sentence_transformers import SentenceTransformer
168
 
169
  model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
170
+ embeddings = model.encode(["Hello world", "Привет мир"], normalize