aveum-0.6B-Finetuned
1. Model Overview
The aevum-0.6B-Finetuned model is a small-scale (0.6 Billion parameter) language model that has been finetuned for code generation and general instruction-following.
It is built upon the Qwen3-0.6B architecture and designed for low-latency inference on common hardware, prioritizing efficiency over state-of-the-art performance.
It has achieved a Pass@1 score of 21.34%, a significant leap past the former alleged world-record holder for parameter efficiency in the sub-1B category, vex-amber-mini-1.0 (which scored 20.21%) and now holds the record itself.
Attribute | Detail |
---|---|
Model Name | aevum-0.6B-Finetuned |
Base Model | Qwen3-0.6B |
Model Type | Decoder-Only Transformer |
Parameters | 0.6 Billion (0.6B) |
Task | Code Generation, Instruction Following |
Language | Primarily English, Python (for code) |
License | Apache 2.0 |
2. Intended Use and Limitations
β Intended Use
This model is best suited for:
- Quick prototyping and local development where hardware resources are limited.
- Serving as a small, fast instruction-following model on CPU or edge devices.
- Educational purposes and learning about model finetuning techniques.
β οΈ Limitations
- Code Complexity: Struggles with complex algorithmic problems, multi-file projects, and intricate data structures.
- Knowledge Cutoff: Limited by pretraining data scope.
- Hallucination: May generate syntactically incorrect or factually wrong code.
- Bias: Reflects dataset biases like all LLMs.
3. Training and Finetuning Details
Finetuning
The base Qwen3-0.6B model was finetuned to enhance Python code generation and instruction-following.
- Datasets: MBPP (Mostly Basic Python Problems) and DeepMind/Code Contests
- Training Method: Supervised Finetuning (SFT) or Parameter-Efficient Finetuning (PEFT)
- Goal: Improve problem-solving and completion accuracy while maintaining small model efficiency.
4. Evaluation Results
The model was evaluated using the HumanEval benchmark via the lm-evaluation-harness
.
HumanEval measures a modelβs ability to produce functionally correct Python code given a docstring and function signature.
Metric | Score | Description |
---|---|---|
HumanEval Pass@1 | 21.34% | Probability the first output is functionally correct. |
HumanEval Pass@10 | Not yet evaluated | Used to estimate potential with multiple generations. |
Peer Comparison (β0.6B Parameter Models)
Model Name | Parameters | Base Architecture | HumanEval Pass@1 |
---|---|---|---|
aevum-0.6B-Finetuned | 0.6B | Qwen3-0.6B | 21.34% |
vex-amber-mini-1.0 | 0.6B | Qwen3-0.6B | 20.21% |
Qwen3-0.6B (Base) | 0.6B | β | β10β12% (est.) |
CodeT5 | 3 B | β 20 % | |
CodeGen | 6 B | Salesforce | β 22 % |
Code Llama | 7 B | Meta | β 24 % |
StarCoder | 7 B | HF / ServiceNow | β 25 % |
PolyCoder | 12.7 B | Berkeley | β 28 % |
Interpretation: The New Efficiency Benchmark
The aveum-0.6B-Finetuned model achieves a Pass@1 score of 21.34%, a significant leap past the former alleged world-record holder for parameter efficiency in the sub-1B category, vex-amber-mini-1.0 (which scored 20.21%).
This result validates the custom fine-tuning approach used and establishes aveum-0.6B-Finetuned as the new benchmark for code generation efficiency at the 0.6 Billion parameter scale, offering performance comparable to models over ten times its size.
5. Usage
You can quickly use the model via the Hugging Face transformers
or huggingface_hub
library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Aevum-Official/aveum-0.6B-Finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Write a Python function to check if a number is prime."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Outputs
Input:
Write a Python function to reverse a string.
Output:
def reverse_string(s: str) -> str:
return s[::-1]
6. Citation & Acknowledgments
If you use aevum-0.6B-Finetuned, please cite or reference this repository and model card:
@misc{aveum06B2025,
title = {aevum-0.6B-Finetuned: Lightweight Python Code Generation Model},
author = {anonymous},
year = {2025},
howpublished = {\url{https://huggingface.co/your-username/aveum-0.6B-Finetuned}},
note = {Fine-tuned on MBPP and DeepMind Code Contests for efficient Python problem-solving}
}
Acknowledgments
- Base model: Qwen3-0.6B
- Datasets: MBPP and DeepMind Code Contests
- Evaluation: lm-evaluation-harness (HumanEval benchmark)
- License: Apache 2.0
π§ The aveum-0.6B-Finetuned model aims to democratize code generation by providing a compact, open, and efficient model for learners, developers, and researchers working on constrained hardware.
- Downloads last month
- 13
Model tree for Aevum-Official/Aevum-0.6B-Finetuned
Datasets used to train Aevum-Official/Aevum-0.6B-Finetuned
Evaluation results
- humanevalself-reported21.340