File size: 5,806 Bytes
861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f 861ed04 da2023f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
license: apache-2.0
base_model:
- intfloat/multilingual-e5-base
language:
- ru
- en
tags:
- sentence-embeddings
- semantic-search
- distillation
- student-model
- multilingual
---
[](https://huggingface.co/skatzR/USER-BGE-M3-E5-Base-Distilled)
# π§© Student-Distilled Sentence Embeddings β Deepvk/USER-bge-m3 β intfloat/multilingual-e5-base
β¨ This repository contains a **student model distilled from [`Deepvk/USER-BGE-M3`](https://huggingface.co/deepvk/USER-bge-m3)** using [`intfloat/multilingual-e5-base`](https://huggingface.co/intfloat/multilingual-e5-base) as the base encoder.
The model is designed for **semantic search**, **retrieval**, and **sentence similarity** tasks in **Russian π·πΊ** and **English π¬π§**, optimized for **practical use without prefixes**.
---
# π Model Card
| Property | Value |
|--------------------|----------------------------------------------------------------------|
| **Teacher Model** | [`Deepvk/USER-BGE-M3`](https://huggingface.co/deepvk/USER-bge-m3) |
| **Base Model** | [`intfloat/multilingual-e5-base`](https://huggingface.co/intfloat/multilingual-e5-base) |
| **Distillation Type** | Embedding-level distillation (teacher β student) |
| **Embedding Dim** | 1024 |
| **Projection** | Dense layer (768 β 1024) |
| **Loss Function** | Mean Squared Error (MSE) |
| **Libraries** | `sentence-transformers`, `torch` |
| **License** | Apache-2.0 |
| **Hardware** | CPU & GPU supported |
---
**About Distillation:**
The model was trained to **replicate the embedding space of Deepvk/USER-BGE-M3**, while maintaining the simplicity and flexibility of E5.
To achieve this:
- Teacher embeddings were precomputed with `Deepvk/USER-BGE-M3`.
- Student embeddings were trained to minimize the **MSE** with the teacherβs embeddings.
- A projection layer (768β1024) was added to match the dimensionality of the teacher model.
- **No prefixes (like βquery:β or βpassage:β)** were used β the student encodes sentences directly.
---
## π Features
- β‘ **Fast inference** β optimized E5-base architecture with no prefix processing
- π§ **High-quality semantic understanding** β inherits BGEβs retrieval capability
- π **Multilingual (RU/EN)** β strong in Russian, solid in English
- π **Teacher-compatible** β embeddings align closely with Deepvk/USER-BGE-M3
- π **Sentence-transformers ready** β plug-and-play for semantic search, clustering, and retrieval
---
## π§ Intended Use
**β
Recommended for:**
- Semantic search and retrieval systems
- Text embedding and similarity pipelines
- Multilingual tasks focused on Russian and English
- Clustering and topic discovery
**β Not ideal for:**
- Prefix-based retrieval setups (e.g., original E5 behavior)
- Cross-encoder scoring tasks
---
## π Training Details
- **Training Objective:** Mimic teacher embeddings (Deepvk/USER-BGE-M3)
- **Dataset Composition:** Retrieval/Semantic ratio = 60/40
- **Language Distribution:** Russian / English β 80 / 20
- **Training Duration:** 5 epochs with warmup and cosine evaluation
- **Optimizer:** AdamW with automatic mixed precision (AMP)
---
## π Evaluation Results
The model was evaluated against the **teacher (`Deepvk/USER-BGE-M3`)** and the **original `intfloat/multilingual-e5-base`** on validation and test datasets.
---
### πΉ TL;DR
- The **distilled E5-base student** reproduces the **Deepvk/USER-BGE-M3** embedding space with **very high fidelity**.
- The **original E5-base** embeddings are **incompatible** with the BGE space (cosine β 0).
- **Recall@1: 86% (Student)** vs **87.7% (Teacher)** β nearly identical retrieval performance.
---
### πΉ Main Metrics
| Split | Model | MSE | Cosine mean | Cosine std | MRR | Recall@1 | Recall@5 | Recall@10 |
|--------------|--------------------|----------:|-------------:|------------:|--------:|----------:|----------:|----------:|
| **Validation** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
| | **Student (E5-distilled)** | **0.000288** | **0.8389** | **0.0498** | **0.9158** | **0.8607** | **0.9829** | **0.9955** |
| | E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 |
| **Test** | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
| | **Student (E5-distilled)** | **0.000276** | **0.8462** | **0.0425** | **0.9176** | **0.8608** | **0.9896** | **0.9956** |
| | E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
---
### πΉ Conclusions
- β
**Student β Teacher** β the distilled model learned the teacherβs semantic space almost perfectly.
- β **Original E5 β Teacher** β default E5 embeddings are unrelated to BGEβs space.
- π **Stable generalization** β validation and test results match closely.
- π§© The new student is a **drop-in BGE-compatible encoder**, with **no prefix requirement**.
---
## π Model Structure
- `USER-BGE-M3-E5-Base-Distilled` β trained model folder containing:
- Transformer encoder (`intfloat/multilingual-e5-base`)
- Pooling layer
- Dense projection layer (768 β 1024)
- Fully compatible with `sentence-transformers` API.
---
## π§© Using the Model
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
embeddings = model.encode(["Hello world", "ΠΡΠΈΠ²Π΅Ρ ΠΌΠΈΡ"], normalize_embeddings=True) |