aveum-0.6B-Finetuned

1. Model Overview

The aevum-0.6B-Finetuned model is a small-scale (0.6 Billion parameter) language model that has been finetuned for code generation and general instruction-following.
It is built upon the Qwen3-0.6B architecture and designed for low-latency inference on common hardware, prioritizing efficiency over state-of-the-art performance. It has achieved a Pass@1 score of 21.34%, a significant leap past the former alleged world-record holder for parameter efficiency in the sub-1B category, vex-amber-mini-1.0 (which scored 20.21%) and now holds the record itself.

Attribute	Detail
Model Name	aevum-0.6B-Finetuned
Base Model	Qwen3-0.6B
Model Type	Decoder-Only Transformer
Parameters	0.6 Billion (0.6B)
Task	Code Generation, Instruction Following
Language	Primarily English, Python (for code)
License	Apache 2.0

2. Intended Use and Limitations

✅ Intended Use

This model is best suited for:

Quick prototyping and local development where hardware resources are limited.
Serving as a small, fast instruction-following model on CPU or edge devices.
Educational purposes and learning about model finetuning techniques.

⚠️ Limitations

Code Complexity: Struggles with complex algorithmic problems, multi-file projects, and intricate data structures.
Knowledge Cutoff: Limited by pretraining data scope.
Hallucination: May generate syntactically incorrect or factually wrong code.
Bias: Reflects dataset biases like all LLMs.

3. Training and Finetuning Details

Finetuning

The base Qwen3-0.6B model was finetuned to enhance Python code generation and instruction-following.

Datasets: MBPP (Mostly Basic Python Problems) and DeepMind/Code Contests
Training Method: Supervised Finetuning (SFT) or Parameter-Efficient Finetuning (PEFT)
Goal: Improve problem-solving and completion accuracy while maintaining small model efficiency.

4. Evaluation Results

The model was evaluated using the HumanEval benchmark via the lm-evaluation-harness.
HumanEval measures a model’s ability to produce functionally correct Python code given a docstring and function signature.

Metric	Score	Description
HumanEval Pass@1	21.34%	Probability the first output is functionally correct.
HumanEval Pass@10	Not yet evaluated	Used to estimate potential with multiple generations.

Peer Comparison (≈0.6B Parameter Models)

Model Name	Parameters	Base Architecture	HumanEval Pass@1
aevum-0.6B-Finetuned	0.6B	Qwen3-0.6B	21.34%
vex-amber-mini-1.0	0.6B	Qwen3-0.6B	20.21%
Qwen3-0.6B (Base)	0.6B	—	≈10–12% (est.)
CodeT5	3 B	Google	≈ 20 %
CodeGen	6 B	Salesforce	≈ 22 %
Code Llama	7 B	Meta	≈ 24 %
StarCoder	7 B	HF / ServiceNow	≈ 25 %
PolyCoder	12.7 B	Berkeley	≈ 28 %

Interpretation: The New Efficiency Benchmark

The aveum-0.6B-Finetuned model achieves a Pass@1 score of 21.34%, a significant leap past the former alleged world-record holder for parameter efficiency in the sub-1B category, vex-amber-mini-1.0 (which scored 20.21%).

This result validates the custom fine-tuning approach used and establishes aveum-0.6B-Finetuned as the new benchmark for code generation efficiency at the 0.6 Billion parameter scale, offering performance comparable to models over ten times its size.

5. Usage

You can quickly use the model via the Hugging Face transformers or huggingface_hub library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Aevum-Official/aveum-0.6B-Finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Write a Python function to check if a number is prime."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Outputs

Input:

Write a Python function to reverse a string.

Output:

def reverse_string(s: str) -> str:
    return s[::-1]

6. Citation & Acknowledgments

If you use aevum-0.6B-Finetuned, please cite or reference this repository and model card:

@misc{aveum06B2025,
  title = {aevum-0.6B-Finetuned: Lightweight Python Code Generation Model},
  author = {anonymous},
  year = {2025},
  howpublished = {\url{https://huggingface.co/your-username/aveum-0.6B-Finetuned}},
  note = {Fine-tuned on MBPP and DeepMind Code Contests for efficient Python problem-solving}
}

Acknowledgments

Base model: Qwen3-0.6B
Datasets: MBPP and DeepMind Code Contests
Evaluation: lm-evaluation-harness (HumanEval benchmark)
License: Apache 2.0

🧠 The aveum-0.6B-Finetuned model aims to democratize code generation by providing a compact, open, and efficient model for learners, developers, and researchers working on constrained hardware.

Downloads last month: 13

Safetensors

Model size

0.6B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Aevum-Official/Aevum-0.6B-Finetuned

Quantizations

2 models

Datasets used to train Aevum-Official/Aevum-0.6B-Finetuned

Evaluation results

humaneval
self-reported

21.340

Metadata error: specify a dataset to view leaderboard