skatzR commited on
Commit
da2023f
·
verified ·
1 Parent(s): 0561347

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -57
README.md CHANGED
@@ -37,14 +37,14 @@ The model is designed for **semantic search**, **retrieval**, and **sentence sim
37
 
38
  ---
39
 
40
- ## About Distillation
41
  The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
42
- Key points:
43
 
44
  - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
45
- - Student embeddings were trained to minimize **MSE** with the teacher’s embeddings.
46
- - A projection layer (768→1024) was added to match the teacher’s embedding dimensionality.
47
- - **No prefixes** (like “query:” or “passage:”) were used — the student encodes sentences directly.
48
 
49
  ---
50
 
@@ -84,70 +84,37 @@ Key points:
84
 
85
  ## 📊 Evaluation Results
86
 
87
- The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets, as well as on **EN (MS MARCO)** and **RU (SberQuad)** benchmarks.
88
-
89
- ### 🔹 TL;DR Summary
90
-
91
- - The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **high fidelity**.
92
- - The **original E5-base** embeddings are **incompatible** with the teacher’s space (cosine ≈ 0).
93
- - Recall@1: **EN ≈ 86% (Student)** vs **87.7% (Teacher)**
94
- - Recall@1: **RU ≈ 65.2% (Student)** vs **59.9% (Teacher)** — student even outperforms teacher on Russian.
95
 
96
  ---
97
 
98
- ## 📊 Consolidated Evaluation Results
99
-
100
- This table combines **main validation/test metrics** with **additional EN/RU benchmarks**.
101
- Note: EN/RU Benchmarks are external datasets used to test retrieval performance; they are **not part of the training/validation splits**.
102
-
103
- | Dataset / Split | Model | MSE | Cosine mean / Cosine_Pos | Cosine std / Cosine_Pos_std | Cosine_Neg / Cosine_Neg | Cosine_Neg_std | MRR | Recall@1 | Recall@5 | Recall@10 |
104
- |--------------------|--------------------|----------|-------------------------|-----------------------------|------------------------|----------------|--------|----------|----------|-----------|
105
- | **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | — | — | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
106
- | | Student (E5-distilled) | 0.000288 | 0.8389 | 0.0498 | — | — | 0.9158 | 0.8607 | 0.9829 | 0.9955 |
107
- | | e5-base (original) | 0.001866 | -0.0042 | 0.0297 | — | — | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
108
- | **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | — | — | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
109
- | | Student (E5-distilled) | 0.000276 | 0.8462 | 0.0425 | — | — | 0.9176 | 0.8608 | 0.9896 | 0.9956 |
110
- | | e5-base (original) | 0.001867 | -0.0027 | 0.0293 | — | — | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
111
- | **EN Benchmark** (MS MARCO) | Teacher (BGE-M3) | — | 0.6710 | 0.0724 | 0.5575 | 0.0676 | 0.6362 | 0.4385 | 0.9205 | 1.0000 |
112
- | | Student (E5-distilled) | — | 0.7233 | 0.0670 | 0.6269 | 0.0615 | 0.5912 | 0.3745 | 0.9130 | 1.0000 |
113
- | | e5-base (original) | — | 0.8886 | 0.0259 | 0.8427 | 0.0264 | 0.6852 | 0.5100 | 0.9380 | 1.0000 |
114
- | **RU Benchmark** (SberQuad) | Teacher (BGE-M3) | — | 0.6070 | 0.0871 | 0.5790 | 0.1140 | 0.7995 | 0.5990 | 1.0000 | 1.0000 |
115
- | | Student (E5-distilled) | — | 0.6716 | 0.0777 | 0.6435 | 0.1016 | 0.8263 | 0.6525 | 1.0000 | 1.0000 |
116
- | | e5-base (original) | — | 0.8467 | 0.0323 | 0.8412 | 0.0426 | 0.7458 | 0.4915 | 1.0000 | 1.0000 |
117
 
118
- ---
119
-
120
- ### Summary (EN)
121
- - Student closely reproduces the teacher’s embedding space.
122
- - Slight drop in Recall@1 (−6.4 p.p.), but Recall@10 remains perfect.
123
- - Student embeddings are **more compact**, with slightly higher cosine similarities.
124
- - Original e5-base is incompatible with teacher’s space.
125
 
126
  ---
127
 
128
- ### 🔹 Russian Benchmark — SberQuad
129
-
130
- | Model | Recall@1 | Recall@5 | Recall@10 | Cosine_Pos | Cosine_Pos_std | Cosine_Neg | Cosine_Neg_std | MRR |
131
- |------------------------|----------|----------|-----------|------------|----------------|------------|----------------|-------|
132
- | Teacher (USER-BGE-M3) | 0.5990 | 1.0000 | 1.0000 | 0.6070 | 0.0871 | 0.5790 | 0.1140 | 0.7995 |
133
- | Student (E5-distilled) | 0.6525 | 1.0000 | 1.0000 | 0.6716 | 0.0777 | 0.6435 | 0.1016 | 0.8263 |
134
- | multilingual-e5-base | 0.4915 | 1.0000 | 1.0000 | 0.8467 | 0.0323 | 0.8412 | 0.0426 | 0.7458 |
135
- | **Δ Student–Teacher** | +0.0535 | 0.0000 | 0.0000 | +0.0646 | −0.0093 | +0.0645 | −0.0125 | +0.0268 |
136
 
137
- ### Summary (RU)
138
- - Student surpasses teacher on Recall@1 (+5.35 p.p.) and MRR (+2.7 p.p.).
139
- - Embedding space is more semantically consistent (higher cosine values).
140
- - Base e5 remains incompatible, as expected.
 
 
 
 
141
 
142
  ---
143
 
144
  ### 🔹 Conclusions
145
 
146
- - 🧩 **Distillation succeeded:** the student replicates the teacher’s embedding space closely.
147
- - 🇷🇺 On Russian, student **outperforms teacher** better Recall@1 and MRR.
148
- - 🇬🇧 On English, student performance is nearly identical, with minimal Recall@1 drop.
149
- - Base e5 embeddings remain incompatible with teacher space.
150
- - 🔄 **Final Result:** a bilingual, lightweight student preserving teacher quality, without prefix requirements.
151
 
152
  ---
153
 
@@ -167,4 +134,4 @@ Note: EN/RU Benchmarks are external datasets used to test retrieval performance;
167
  from sentence_transformers import SentenceTransformer
168
 
169
  model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
170
- embeddings = model.encode(["Hello world", "Привет мир"], normalize
 
37
 
38
  ---
39
 
40
+ **About Distillation:**
41
  The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
42
+ To achieve this:
43
 
44
  - Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
45
+ - Student embeddings were trained to minimize the **MSE** with the teacher’s embeddings.
46
+ - A projection layer (768→1024) was added to match the dimensionality of the teacher model.
47
+ - **No prefixes (like “query:” or “passage:”)** were used — the student encodes sentences directly.
48
 
49
  ---
50
 
 
84
 
85
  ## 📊 Evaluation Results
86
 
87
+ The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets.
 
 
 
 
 
 
 
88
 
89
  ---
90
 
91
+ ### 🔹 TL;DR
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
+ - The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **very high fidelity**.
94
+ - The **original E5-base** embeddings are **incompatible** with the BGE space (cosine ≈ 0).
95
+ - **Recall@1: 86% (Student)** vs **87.7% (Teacher)** — nearly identical retrieval performance.
 
 
 
 
96
 
97
  ---
98
 
99
+ ### 🔹 Main Metrics
 
 
 
 
 
 
 
100
 
101
+ | Split | Model | MSE | Cosine mean | Cosine std | MRR | Recall@1 | Recall@5 | Recall@10 |
102
+ |--------------|--------------------|----------:|-------------:|------------:|--------:|----------:|----------:|----------:|
103
+ | **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
104
+ | | **Student (E5-distilled)** | **0.000288** | **0.8389** | **0.0498** | **0.9158** | **0.8607** | **0.9829** | **0.9955** |
105
+ | | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
106
+ | **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
107
+ | | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** |
108
+ | | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
109
 
110
  ---
111
 
112
  ### 🔹 Conclusions
113
 
114
+ - **Student Teacher** — the distilled model learned the teacher’s semantic space almost perfectly.
115
+ - **Original E5 Teacher** — default E5 embeddings are unrelated to BGE’s space.
116
+ - 📈 **Stable generalization** validation and test results match closely.
117
+ - 🧩 The new student is a **drop-in BGE-compatible encoder**, with **no prefix requirement**.
 
118
 
119
  ---
120
 
 
134
  from sentence_transformers import SentenceTransformer
135
 
136
  model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
137
+ embeddings = model.encode(["Hello world", "Привет мир"], normalize_embeddings=True)