Qodo-Embed-M-1-1.5B-M2V-Distilled

This project optimizes the Qodo-Embed-1-1.5B model using Model2Vec, reducing its size and dramatically improving inference speed while maintaining most of its performance capabilities.

Overview

Qodo-Embed-1-1.5B is a state-of-the-art code embedding model designed for retrieval tasks in the software development domain. While powerful, it can be resource-intensive for production use cases.

Model2Vec is a technique to distill large sentence transformer models into small, fast static embedding models. This project applies Model2Vec to create an optimized version of Qodo-Embed-1-1.5B with the following benefits:

  • Smaller Size: Reduces model size by a factor of 25x
  • Faster Inference: Up to 112x faster inference
  • Low Resource Requirements: Minimal memory footprint and dependencies
  • Maintains Performance: Retains most of the original model's capabilities

Model Information

  • Model Name: Qodo-Embed-M-1-1.5B-M2V-Distilled
  • Original Model: Qodo-Embed-1-1.5B
  • Distillation Method: Model2Vec
  • Original Dimensions: 1536
  • Distilled Dimensions: 384
  • Explained Variance: ~85%
  • Size Reduction: 25.26x (from 5.9GB to 233MB)
  • Speed Improvement: 112.14x faster

Installation

First, ensure you have the required dependencies:

# Install the base package
uv add --group model2vec model2vec 'model2vec[distill]' sentence-transformers transformers

# Install additional dependencies for evaluation
uv add --group model2vec matplotlib psutil

Usage

Distillation

To create a distilled version of Qodo-Embed-1-1.5B:

python models/qodo_embed_m2v/distill.py --pca_dims 384

Options:

  • --model_name - Source model name (default: "Qodo/Qodo-Embed-1-1.5B")
  • --output_dir - Where to save the distilled model (default: "models/qodo_embed_m2v")
  • --pca_dims - Dimensions for PCA reduction; smaller values create faster but less accurate models (default: 384)
  • --save_to_hub - Push the model to HuggingFace Hub
  • --hub_model_id - Model ID for HuggingFace Hub (required if saving to hub)
  • --skip_readme - Skip generating README file (default: True)

Evaluation

To evaluate the distilled model against the original:

python models/qodo_embed_m2v/evaluate.py

Options:

  • --original_model - Original model name (default: "Qodo/Qodo-Embed-1-1.5B")
  • --distilled_model - Path to the distilled model (default: "models/qodo_embed_m2v")
  • --output_dir - Where to save evaluation results (default: "models/qodo_embed_m2v/evaluation")

Example Code

from model2vec import StaticModel
from sentence_transformers import SentenceTransformer
import time

# Sample code for embedding
code_samples = [
    "def process_data_stream(source_iterator):",
    "implement binary search tree",
    "how to handle memory efficient data streaming",
    """class LazyLoader:
        def __init__(self, source):
            self.generator = iter(source)
            self._cache = []"""
]

# Load original model
print("Loading original model...")
original_model = SentenceTransformer("Qodo/Qodo-Embed-1-1.5B")

# Load distilled model
print("Loading distilled model...")
distilled_model = StaticModel.from_pretrained("models/qodo_embed_m2v")

# Compare embedding speed
print("\nGenerating embeddings with original model...")
start = time.time()
original_embeddings = original_model.encode(code_samples)
original_time = time.time() - start
print(f"Original model took: {original_time:.4f} seconds")

print("\nGenerating embeddings with distilled model...")
start = time.time()
distilled_embeddings = distilled_model.encode(code_samples)
distilled_time = time.time() - start
print(f"Distilled model took: {distilled_time:.4f} seconds")
print(f"Speed improvement: {original_time/distilled_time:.2f}x faster")

print(f"\nOriginal embedding dimensions: {original_embeddings.shape}")
print(f"Distilled embedding dimensions: {distilled_embeddings.shape}")

Results

The distilled model achieves:

  • 25.26x reduction in model size (from 5.9GB to 233MB)
  • 112.14x increase in inference speed
  • 85.1% explained variance with PCA reduction to 384 dimensions

Detailed evaluation results, including similarity plots and performance metrics, are saved to the evaluation output directory.

Project Structure

  • distill.py - Script to create the distilled model
  • evaluate.py - Script to compare performance with the original model
  • example.py - Example usage of the distilled model
  • evaluation/ - Directory containing evaluation results and visualizations

Acknowledgments

This project is built upon the following technologies:

  • Qodo-Embed-1-1.5B - The original code embedding model developed by QodoAI
  • Model2Vec - The distillation technique used to optimize the model

License

This model is licensed under the QodoAI-Open-RAIL-M license, the same as the original Qodo-Embed-1-1.5B model. Any derivative model must include "Qodo" at the beginning of its name per the license requirements.

Downloads last month
77
Safetensors
Model size
58.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sarthak1/Qodo-Embed-M-1-1.5B-M2V-Distilled

Finetuned
(1)
this model