Qodo-Embed-M-1-1.5B-M2V-Distilled
This project optimizes the Qodo-Embed-1-1.5B model using Model2Vec, reducing its size and dramatically improving inference speed while maintaining most of its performance capabilities.
Overview
Qodo-Embed-1-1.5B is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain. While powerful, it can be resource-intensive for production use cases.
Model2Vec is a technique to distill large sentence transformer models into small, fast static embedding models. This project applies Model2Vec to create an optimized version of Qodo-Embed-1-1.5B with the following benefits:
- Smaller Size: Reduces model size by a factor of 25x
- Faster Inference: Up to 112x faster inference
- Low Resource Requirements: Minimal memory footprint and dependencies
- Maintains Performance: Retains most of the original model's capabilities
Model Information
- Model Name: Qodo-Embed-M-1-1.5B-M2V-Distilled
- Original Model: Qodo-Embed-1-1.5B
- Distillation Method: Model2Vec
- Original Dimensions: 1536
- Distilled Dimensions: 384
- Explained Variance: ~85%
- Size Reduction: 25.26x (from 5.9GB to 233MB)
- Speed Improvement: 112.14x faster
Installation
First, ensure you have the required dependencies:
# Install the base package
uv add --group model2vec model2vec 'model2vec[distill]' sentence-transformers transformers
# Install additional dependencies for evaluation
uv add --group model2vec matplotlib psutil
Usage
Distillation
To create a distilled version of Qodo-Embed-1-1.5B:
python models/qodo_embed_m2v/distill.py --pca_dims 384
Options:
--model_name
- Source model name (default: "Qodo/Qodo-Embed-1-1.5B")--output_dir
- Where to save the distilled model (default: "models/qodo_embed_m2v")--pca_dims
- Dimensions for PCA reduction; smaller values create faster but less accurate models (default: 384)--save_to_hub
- Push the model to HuggingFace Hub--hub_model_id
- Model ID for HuggingFace Hub (required if saving to hub)--skip_readme
- Skip generating README file (default: True)
Evaluation
To evaluate the distilled model against the original:
python models/qodo_embed_m2v/evaluate.py
Options:
--original_model
- Original model name (default: "Qodo/Qodo-Embed-1-1.5B")--distilled_model
- Path to the distilled model (default: "models/qodo_embed_m2v")--output_dir
- Where to save evaluation results (default: "models/qodo_embed_m2v/evaluation")
Example Code
from model2vec import StaticModel
from sentence_transformers import SentenceTransformer
import time
# Sample code for embedding
code_samples = [
"def process_data_stream(source_iterator):",
"implement binary search tree",
"how to handle memory efficient data streaming",
"""class LazyLoader:
def __init__(self, source):
self.generator = iter(source)
self._cache = []"""
]
# Load original model
print("Loading original model...")
original_model = SentenceTransformer("Qodo/Qodo-Embed-1-1.5B")
# Load distilled model
print("Loading distilled model...")
distilled_model = StaticModel.from_pretrained("models/qodo_embed_m2v")
# Compare embedding speed
print("\nGenerating embeddings with original model...")
start = time.time()
original_embeddings = original_model.encode(code_samples)
original_time = time.time() - start
print(f"Original model took: {original_time:.4f} seconds")
print("\nGenerating embeddings with distilled model...")
start = time.time()
distilled_embeddings = distilled_model.encode(code_samples)
distilled_time = time.time() - start
print(f"Distilled model took: {distilled_time:.4f} seconds")
print(f"Speed improvement: {original_time/distilled_time:.2f}x faster")
print(f"\nOriginal embedding dimensions: {original_embeddings.shape}")
print(f"Distilled embedding dimensions: {distilled_embeddings.shape}")
Results
The distilled model achieves:
- 25.26x reduction in model size (from 5.9GB to 233MB)
- 112.14x increase in inference speed
- 85.1% explained variance with PCA reduction to 384 dimensions
Detailed evaluation results, including similarity plots and performance metrics, are saved to the evaluation output directory.
Project Structure
distill.py
- Script to create the distilled modelevaluate.py
- Script to compare performance with the original modelexample.py
- Example usage of the distilled modelevaluation/
- Directory containing evaluation results and visualizations
Acknowledgments
This project is built upon the following technologies:
- Qodo-Embed-1-1.5B - The original code embedding model developed by QodoAI
- Model2Vec - The distillation technique used to optimize the model
License
This model is licensed under the QodoAI-Open-RAIL-M license, the same as the original Qodo-Embed-1-1.5B model. Any derivative model must include "Qodo" at the beginning of its name per the license requirements.
- Downloads last month
- 77
Model tree for sarthak1/Qodo-Embed-M-1-1.5B-M2V-Distilled
Base model
Alibaba-NLP/gte-Qwen2-1.5B-instruct