Hugging Face

🧩 Student-Distilled Sentence Embeddings β€” Deepvk/USER-bge-m3 β†’ intfloat/multilingual-e5-base

✨ This repository contains a student model distilled from Deepvk/USER-BGE-M3 using intfloat/multilingual-e5-base as the base encoder.
The model is designed for semantic search, retrieval, and sentence similarity tasks in Russian πŸ‡·πŸ‡Ί and English πŸ‡¬πŸ‡§, optimized for practical use without prefixes.


πŸ” Model Card

Property Value
Teacher Model Deepvk/USER-BGE-M3
Base Model intfloat/multilingual-e5-base
Distillation Type Embedding-level distillation (teacher β†’ student)
Embedding Dim 1024
Projection Dense layer (768 β†’ 1024)
Loss Function Mean Squared Error (MSE)
Libraries sentence-transformers, torch
License Apache-2.0
Hardware CPU & GPU supported

About Distillation:
The model was trained to replicate the embedding space of Deepvk/USER-BGE-M3, while maintaining the simplicity and flexibility of E5.
To achieve this:

  • Teacher embeddings were precomputed with Deepvk/USER-BGE-M3.
  • Student embeddings were trained to minimize the MSE with the teacher’s embeddings.
  • A projection layer (768β†’1024) was added to match the dimensionality of the teacher model.
  • No prefixes (like β€œquery:” or β€œpassage:”) were used β€” the student encodes sentences directly.

πŸš€ Features

  • ⚑ Fast inference β€” optimized E5-base architecture with no prefix processing
  • 🧠 High-quality semantic understanding β€” inherits BGE’s retrieval capability
  • 🌍 Multilingual (RU/EN) β€” strong in Russian, solid in English
  • πŸ”„ Teacher-compatible β€” embeddings align closely with Deepvk/USER-BGE-M3
  • πŸ›  Sentence-transformers ready β€” plug-and-play for semantic search, clustering, and retrieval

🧠 Intended Use

βœ… Recommended for:

  • Semantic search and retrieval systems
  • Text embedding and similarity pipelines
  • Multilingual tasks focused on Russian and English
  • Clustering and topic discovery

❌ Not ideal for:

  • Prefix-based retrieval setups (e.g., original E5 behavior)
  • Cross-encoder scoring tasks

πŸ“š Training Details

  • Training Objective: Mimic teacher embeddings (Deepvk/USER-BGE-M3)
  • Dataset Composition: Retrieval/Semantic ratio = 60/40
  • Language Distribution: Russian / English β‰ˆ 80 / 20
  • Training Duration: 5 epochs with warmup and cosine evaluation
  • Optimizer: AdamW with automatic mixed precision (AMP)

πŸ“Š Evaluation Results

The model was evaluated against the teacher (Deepvk/USER-BGE-M3) and the original intfloat/multilingual-e5-base on validation and test datasets.


πŸ”Ή TL;DR

  • The distilled E5-base student reproduces the Deepvk/USER-BGE-M3 embedding space with very high fidelity.
  • The original E5-base embeddings are incompatible with the BGE space (cosine β‰ˆ 0).
  • Recall@1: 86% (Student) vs 87.7% (Teacher) β€” nearly identical retrieval performance.

πŸ”Ή Main Metrics

Split Model MSE Cosine mean Cosine std MRR Recall@1 Recall@5 Recall@10
Validation Teacher (BGE-M3) 0.000000 1.0000 0.0000 0.9244 0.8746 0.9851 0.9966
Student (E5-distilled) 0.000288 0.8389 0.0498 0.9158 0.8607 0.9829 0.9955
E5-base (original) 0.001866 -0.0042 0.0297 0.0003 0.0000 0.0002 0.0003
Test Teacher (BGE-M3) 0.000000 1.0000 0.0000 0.9273 0.8771 0.9908 0.9962
Student (E5-distilled) 0.000276 0.8462 0.0425 0.9176 0.8608 0.9896 0.9956
E5-base (original) 0.001867 -0.0027 0.0293 0.0002 0.0000 0.0001 0.0002

πŸ”Ή Conclusions

  • βœ… Student β‰ˆ Teacher β€” the distilled model learned the teacher’s semantic space almost perfectly.
  • ❌ Original E5 β‰  Teacher β€” default E5 embeddings are unrelated to BGE’s space.
  • πŸ“ˆ Stable generalization β€” validation and test results match closely.
  • 🧩 The new student is a drop-in BGE-compatible encoder, with no prefix requirement.

πŸ“‚ Model Structure

  • USER-BGE-M3-E5-Base-Distilled β€” trained model folder containing:
    • Transformer encoder (intfloat/multilingual-e5-base)
    • Pooling layer
    • Dense projection layer (768 β†’ 1024)
  • Fully compatible with sentence-transformers API.

🧩 Using the Model

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("skatzR/USER-BGE-M3-E5-Base-Distilled")
embeddings = model.encode(["Hello world", "ΠŸΡ€ΠΈΠ²Π΅Ρ‚ ΠΌΠΈΡ€"], normalize_embeddings=True)
Downloads last month
49
Safetensors
Model size
278M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for skatzR/USER-BGE-M3-E5-Base-Distilled

Finetuned
(90)
this model