M3-V2: An Open Source Model for State-of-the-Art Code Generation

M3-V2 is a state-of-the-art causal language model featuring a novel architecture that enables advanced reasoning and self-correction. This model is fully open source under the Apache 2.0 license, making it available for academic, personal, and commercial use.

The model achieves a groundbreaking 98.17% Pass@1 score on the HumanEval benchmark, placing it at the cutting edge of AI code generation and making it one of the most powerful open-source code generation engines available today.

Benchmark Performance

The benchmark results demonstrate a level of performance that significantly surpasses many publicly available models.

Performance Comparison

Model	HumanEval Pass@1 Score	Note
moelanoby/phi3-M3-V2 (This Model)	95.12% / 98.17% / 98.56%	Apache 2.0 License. Scores correspond to 0, 1, and 2 self-correction passes, with 1 being the default.
GPT-4.5 / "Orion"	`~96.00%`	Projected (Late 2025)
Gemini 2.5 Pro	`~95.00%`	Projected (Late 2025)
Claude 4	`~94.00%`	Projected (Late 2025)

A more reliable benchmark is one that's made by u/Chromix_

Test	This LLM	Phi3-Mini-Instruct
junior-v2 Python	83	90 / 83
junior-v2 JavaScript	72	85 / 79
senior Python	25	59 / 30
senior JavaScript	39	37 / 23

Support the Project

M3-V2 is an open-source project, free for everyone to use. I am passionate about creating powerful and accessible AI tools for the community.

If you find this model helpful in your work, research, or personal projects, please consider supporting its development. Your contribution helps cover training costs, allows me to dedicate more time to improvements, and fuels the creation of new open-source models. Every little bit helps and is greatly appreciated!

Support via PayPal

License

This model is licensed under the Apache 2.0 License. You are free to use, modify, and distribute this model and its source code for any purpose, including commercial applications, subject to the terms of the license. You can find a copy of the license in the repository.

Ethical Considerations

While this model is open source, users are encouraged to use it responsibly. Finetuning the model to generate harmful, illegal, or unethical content is strongly discouraged. I advocate for the use of this technology to build positive and safe applications.

And please don't put the architecture in any image generation AI models I love supporting real artists very much and it would be sad that it gets taken over by AI art :/

How to Use

use the installation guide AND the python implementation :]

Installation

First, ensure you have the necessary libraries installed:

# python version >= 3.11
pip install torch transformers accelerate

Python Implementation

You can easily integrate the model into your application. You must use trust_remote_code=True for the custom architecture to load correctly from the Hub.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "moelanoby/phi3-M3-V2"

print("Loading tokenizer and model...")
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_ID, 
    trust_remote_code=True, 
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
print("Model loaded successfully.")

# --- Controlling the model's self-correction feature ---
# Default is 1 pass. You can adjust it for different performance profiles.
try:
    target_layer_path = "model.layers.15.mlp.gate_up_proj" 
    custom_layer = model
    for part in target_layer_path.split('.'):
        custom_layer = getattr(custom_layer, part)
        
    # Set the number of self-correction passes (e.g., 0, 1, 2, or 3)
    custom_layer.num_correction_passes = 2 
    print(f"✅ Number of self-correction passes set to: {custom_layer.num_correction_passes}")
except AttributeError:
    print("⚠️ Could not access the custom layer. The model will run with its default settings.")

# (Example generation code would follow here)

Important Notes

Downside: The model might become more incoherent or less accurate as you add more self-correction passes. Experiment to find the best balance for your use case.
Recommendations: You can use 1, 2, or 3 self-correction passes if needed. 2 passes is the most recommended setting for a balance of performance and coherence.

Acknowledgements

The base of this model utilizes the Phi-3 architecture developed by Microsoft.
The benchmark results were obtained using the HumanEval dataset from OpenAI.
I thank the open-source community for their continuous contributions to AI research.

moelanoby
/

phi-3-M3-coder