π§© Student-Distilled Sentence Embeddings β Deepvk/USER-bge-m3 β intfloat/multilingual-e5-base
β¨ This repository contains a student model distilled from Deepvk/USER-BGE-M3
using intfloat/multilingual-e5-base
as the base encoder.
The model is designed for semantic search, retrieval, and sentence similarity tasks in Russian π·πΊ and English π¬π§, optimized for practical use without prefixes.
π Model Card
Property | Value |
---|---|
Teacher Model | Deepvk/USER-BGE-M3 |
Base Model | intfloat/multilingual-e5-base |
Distillation Type | Embedding-level distillation (teacher β student) |
Embedding Dim | 1024 |
Projection | Dense layer (768 β 1024) |
Loss Function | Mean Squared Error (MSE) |
Libraries | sentence-transformers , torch |
License | Apache-2.0 |
Hardware | CPU & GPU supported |
About Distillation:
The model was trained to replicate the embedding space of Deepvk/USER-BGE-M3, while maintaining the simplicity and flexibility of E5.
To achieve this:
- Teacher embeddings were precomputed with
Deepvk/USER-BGE-M3
. - Student embeddings were trained to minimize the MSE with the teacherβs embeddings.
- A projection layer (768β1024) was added to match the dimensionality of the teacher model.
- No prefixes (like βquery:β or βpassage:β) were used β the student encodes sentences directly.
π Features
- β‘ Fast inference β optimized E5-base architecture with no prefix processing
- π§ High-quality semantic understanding β inherits BGEβs retrieval capability
- π Multilingual (RU/EN) β strong in Russian, solid in English
- π Teacher-compatible β embeddings align closely with Deepvk/USER-BGE-M3
- π Sentence-transformers ready β plug-and-play for semantic search, clustering, and retrieval
π§ Intended Use
β Recommended for:
- Semantic search and retrieval systems
- Text embedding and similarity pipelines
- Multilingual tasks focused on Russian and English
- Clustering and topic discovery
β Not ideal for:
- Prefix-based retrieval setups (e.g., original E5 behavior)
- Cross-encoder scoring tasks
π Training Details
- Training Objective: Mimic teacher embeddings (Deepvk/USER-BGE-M3)
- Dataset Composition: Retrieval/Semantic ratio = 60/40
- Language Distribution: Russian / English β 80 / 20
- Training Duration: 5 epochs with warmup and cosine evaluation
- Optimizer: AdamW with automatic mixed precision (AMP)
π Evaluation Results
The model was evaluated against the teacher (Deepvk/USER-BGE-M3
) and the original intfloat/multilingual-e5-base
on validation and test datasets.
πΉ TL;DR
- The distilled E5-base student reproduces the Deepvk/USER-BGE-M3 embedding space with very high fidelity.
- The original E5-base embeddings are incompatible with the BGE space (cosine β 0).
- Recall@1: 86% (Student) vs 87.7% (Teacher) β nearly identical retrieval performance.
πΉ Main Metrics
Split | Model | MSE | Cosine mean | Cosine std | MRR | Recall@1 | Recall@5 | Recall@10 |
---|---|---|---|---|---|---|---|---|
Validation | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9244 | 0.8746 | 0.9851 | 0.9966 |
Student (E5-distilled) | 0.000288 | 0.8389 | 0.0498 | 0.9158 | 0.8607 | 0.9829 | 0.9955 | |
E5-base (original) | 0.001866 | -0.0042 | 0.0297 | 0.0003 | 0.0000 | 0.0002 | 0.0003 | |
Test | Teacher (BGE-M3) | 0.000000 | 1.0000 | 0.0000 | 0.9273 | 0.8771 | 0.9908 | 0.9962 |
Student (E5-distilled) | 0.000276 | 0.8462 | 0.0425 | 0.9176 | 0.8608 | 0.9896 | 0.9956 | |
E5-base (original) | 0.001867 | -0.0027 | 0.0293 | 0.0002 | 0.0000 | 0.0001 | 0.0002 |
πΉ Conclusions
- β Student β Teacher β the distilled model learned the teacherβs semantic space almost perfectly.
- β Original E5 β Teacher β default E5 embeddings are unrelated to BGEβs space.
- π Stable generalization β validation and test results match closely.
- π§© The new student is a drop-in BGE-compatible encoder, with no prefix requirement.
π Model Structure
USER-BGE-M3-E5-Base-Distilled
β trained model folder containing:- Transformer encoder (
intfloat/multilingual-e5-base
) - Pooling layer
- Dense projection layer (768 β 1024)
- Transformer encoder (
- Fully compatible with
sentence-transformers
API.
π§© Using the Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
embeddings = model.encode(["Hello world", "ΠΡΠΈΠ²Π΅Ρ ΠΌΠΈΡ"], normalize_embeddings=True)
- Downloads last month
- 49
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for skatzR/USER-BGE-M3-E5-Base-Distilled
Base model
intfloat/multilingual-e5-base