🧩 Student-Distilled Sentence Embeddings — Deepvk/USER-bge-m3 → intfloat/multilingual-e5-base

✨ This repository contains a student model distilled from Deepvk/USER-BGE-M3 using intfloat/multilingual-e5-base as the base encoder.
The model is designed for semantic search, retrieval, and sentence similarity tasks in Russian 🇷🇺 and English 🇬🇧, optimized for practical use without prefixes.

🔍 Model Card

Property	Value
Teacher Model	`Deepvk/USER-BGE-M3`
Base Model	`intfloat/multilingual-e5-base`
Distillation Type	Embedding-level distillation (teacher → student)
Embedding Dim	1024
Projection	Dense layer (768 → 1024)
Loss Function	Mean Squared Error (MSE)
Libraries	`sentence-transformers`, `torch`
License	Apache-2.0
Hardware	CPU & GPU supported

About Distillation:
The model was trained to replicate the embedding space of Deepvk/USER-BGE-M3, while maintaining the simplicity and flexibility of E5.
To achieve this:

Teacher embeddings were precomputed with Deepvk/USER-BGE-M3.
Student embeddings were trained to minimize the MSE with the teacher’s embeddings.
A projection layer (768→1024) was added to match the dimensionality of the teacher model.
No prefixes (like “query:” or “passage:”) were used — the student encodes sentences directly.

🚀 Features

⚡ Fast inference — optimized E5-base architecture with no prefix processing
🧠 High-quality semantic understanding — inherits BGE’s retrieval capability
🌍 Multilingual (RU/EN) — strong in Russian, solid in English
🔄 Teacher-compatible — embeddings align closely with Deepvk/USER-BGE-M3
🛠 Sentence-transformers ready — plug-and-play for semantic search, clustering, and retrieval

🧠 Intended Use

✅ Recommended for:

Semantic search and retrieval systems
Text embedding and similarity pipelines
Multilingual tasks focused on Russian and English
Clustering and topic discovery

❌ Not ideal for:

Prefix-based retrieval setups (e.g., original E5 behavior)
Cross-encoder scoring tasks

📚 Training Details

Training Objective: Mimic teacher embeddings (Deepvk/USER-BGE-M3)
Dataset Composition: Retrieval/Semantic ratio = 60/40
Language Distribution: Russian / English ≈ 80 / 20
Training Duration: 5 epochs with warmup and cosine evaluation
Optimizer: AdamW with automatic mixed precision (AMP)

📊 Evaluation Results

The model was evaluated against the teacher (Deepvk/USER-BGE-M3) and the original intfloat/multilingual-e5-base on validation and test datasets.

🔹 TL;DR

The distilled E5-base student reproduces the Deepvk/USER-BGE-M3 embedding space with very high fidelity.
The original E5-base embeddings are incompatible with the BGE space (cosine ≈ 0).
Recall@1: 86% (Student) vs 87.7% (Teacher) — nearly identical retrieval performance.

🔹 Main Metrics

Split	Model	MSE	Cosine mean	Cosine std	MRR	Recall@1	Recall@5	Recall@10
Validation	Teacher (BGE-M3)	0.000000	1.0000	0.0000	0.9244	0.8746	0.9851	0.9966
	Student (E5-distilled)	0.000288	0.8389	0.0498	0.9158	0.8607	0.9829	0.9955
	E5-base (original)	0.001866	-0.0042	0.0297	0.0003	0.0000	0.0002	0.0003
Test	Teacher (BGE-M3)	0.000000	1.0000	0.0000	0.9273	0.8771	0.9908	0.9962
	Student (E5-distilled)	0.000276	0.8462	0.0425	0.9176	0.8608	0.9896	0.9956
	E5-base (original)	0.001867	-0.0027	0.0293	0.0002	0.0000	0.0001	0.0002

🔹 Conclusions

✅ Student ≈ Teacher — the distilled model learned the teacher’s semantic space almost perfectly.
❌ Original E5 ≠ Teacher — default E5 embeddings are unrelated to BGE’s space.
📈 Stable generalization — validation and test results match closely.
🧩 The new student is a drop-in BGE-compatible encoder, with no prefix requirement.

📂 Model Structure

USER-BGE-M3-E5-Base-Distilled — trained model folder containing:
- Transformer encoder (intfloat/multilingual-e5-base)
- Pooling layer
- Dense projection layer (768 → 1024)
Fully compatible with sentence-transformers API.

🧩 Using the Model

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
embeddings = model.encode(["Hello world", "Привет мир"], normalize_embeddings=True)

Downloads last month: 49

Safetensors

Model size

278M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skatzR/USER-BGE-M3-E5-Base-Distilled

Base model

intfloat/multilingual-e5-base

Finetuned

(90)

this model