aveum-0.6B-Finetuned

1. Model Overview

The aevum-0.6B-Finetuned model is a small-scale (0.6 Billion parameter) language model that has been finetuned for code generation and general instruction-following.
It is built upon the Qwen3-0.6B architecture and designed for low-latency inference on common hardware, prioritizing efficiency over state-of-the-art performance. It has achieved a Pass@1 score of 21.34%, a significant leap past the former alleged world-record holder for parameter efficiency in the sub-1B category, vex-amber-mini-1.0 (which scored 20.21%) and now holds the record itself.

Attribute Detail
Model Name aevum-0.6B-Finetuned
Base Model Qwen3-0.6B
Model Type Decoder-Only Transformer
Parameters 0.6 Billion (0.6B)
Task Code Generation, Instruction Following
Language Primarily English, Python (for code)
License Apache 2.0

2. Intended Use and Limitations

βœ… Intended Use

This model is best suited for:

  • Quick prototyping and local development where hardware resources are limited.
  • Serving as a small, fast instruction-following model on CPU or edge devices.
  • Educational purposes and learning about model finetuning techniques.

⚠️ Limitations

  • Code Complexity: Struggles with complex algorithmic problems, multi-file projects, and intricate data structures.
  • Knowledge Cutoff: Limited by pretraining data scope.
  • Hallucination: May generate syntactically incorrect or factually wrong code.
  • Bias: Reflects dataset biases like all LLMs.

3. Training and Finetuning Details

Finetuning

The base Qwen3-0.6B model was finetuned to enhance Python code generation and instruction-following.

  • Datasets: MBPP (Mostly Basic Python Problems) and DeepMind/Code Contests
  • Training Method: Supervised Finetuning (SFT) or Parameter-Efficient Finetuning (PEFT)
  • Goal: Improve problem-solving and completion accuracy while maintaining small model efficiency.

4. Evaluation Results

The model was evaluated using the HumanEval benchmark via the lm-evaluation-harness.
HumanEval measures a model’s ability to produce functionally correct Python code given a docstring and function signature.

Metric Score Description
HumanEval Pass@1 21.34% Probability the first output is functionally correct.
HumanEval Pass@10 Not yet evaluated Used to estimate potential with multiple generations.

Peer Comparison (β‰ˆ0.6B Parameter Models)

Model Name Parameters Base Architecture HumanEval Pass@1
aevum-0.6B-Finetuned 0.6B Qwen3-0.6B 21.34%
vex-amber-mini-1.0 0.6B Qwen3-0.6B 20.21%
Qwen3-0.6B (Base) 0.6B β€” β‰ˆ10–12% (est.)
CodeT5 3 B Google β‰ˆ 20 %
CodeGen 6 B Salesforce β‰ˆ 22 %
Code Llama 7 B Meta β‰ˆ 24 %
StarCoder 7 B HF / ServiceNow β‰ˆ 25 %
PolyCoder 12.7 B Berkeley β‰ˆ 28 %

Interpretation: The New Efficiency Benchmark

The aveum-0.6B-Finetuned model achieves a Pass@1 score of 21.34%, a significant leap past the former alleged world-record holder for parameter efficiency in the sub-1B category, vex-amber-mini-1.0 (which scored 20.21%).

This result validates the custom fine-tuning approach used and establishes aveum-0.6B-Finetuned as the new benchmark for code generation efficiency at the 0.6 Billion parameter scale, offering performance comparable to models over ten times its size.


5. Usage

You can quickly use the model via the Hugging Face transformers or huggingface_hub library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Aevum-Official/aveum-0.6B-Finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Write a Python function to check if a number is prime."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Outputs

Input:

Write a Python function to reverse a string.

Output:

def reverse_string(s: str) -> str:
    return s[::-1]

6. Citation & Acknowledgments

If you use aevum-0.6B-Finetuned, please cite or reference this repository and model card:

@misc{aveum06B2025,
  title = {aevum-0.6B-Finetuned: Lightweight Python Code Generation Model},
  author = {anonymous},
  year = {2025},
  howpublished = {\url{https://huggingface.co/your-username/aveum-0.6B-Finetuned}},
  note = {Fine-tuned on MBPP and DeepMind Code Contests for efficient Python problem-solving}
}

Acknowledgments

  • Base model: Qwen3-0.6B
  • Datasets: MBPP and DeepMind Code Contests
  • Evaluation: lm-evaluation-harness (HumanEval benchmark)
  • License: Apache 2.0

🧠 The aveum-0.6B-Finetuned model aims to democratize code generation by providing a compact, open, and efficient model for learners, developers, and researchers working on constrained hardware.

Downloads last month
13
Safetensors
Model size
0.6B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Aevum-Official/Aevum-0.6B-Finetuned

Quantizations
2 models

Datasets used to train Aevum-Official/Aevum-0.6B-Finetuned