metadata

pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
base_model: meta-llama/Llama-3.2-3B
tags:
  - language
  - aquif
  - text-generation-inference
  - math
  - coding
  - small
languages:
  - en
  - de
  - it
  - pt
  - fr
  - hi
  - es
  - th
  - zh
  - ja
language:
  - pt
  - en
  - ja
  - zh
  - th
  - es
  - hi
  - fr
  - de
  - it

aquif-3-mini

A high-performance 3.2B parameter language model based on Meta's Llama 3.2 architecture, optimized for efficiency while maintaining strong capabilities across multiple domains including general knowledge, science, mathematics, coding, and multilingual tasks.

Model Details

Base Model: meta-llama/Llama-3.2-3B
Architecture: Llama
Parameter Count: 3.2 billion parameters
Languages: English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, Japanese

Performance Benchmarks

Detailed Benchmark Results

Metric	aquif-3-mini (3.2B)	Llama 3.2 (3.2B)	Qwen3 (4B)	Gemma 3n E4B (8.4B)	SmolLM3 (3.1B)	Phi-4 mini (3.8B)	Granite 3.3 (2.5B)
MMLU (General Knowledge)	67.5	63.4	67.0	64.9	59.5	67.3	55.9
GPQA Diamond (Science)	36.1	29.4	40.7	29.6	35.7	36.9	25.3
AIME 2025 (Competition Math)	9.6	0.3	17.1	11.6	9.3	10.0	2.5
LiveCodeBench (Coding)	15.4	8.3	23.3	14.6	15.2	12.6	9.4
Global MMLU (Multilingual)	58.0	46.8	65.1	53.1	53.5	49.3	49.7
IFEval (Instruction Following)	78.9	71.6	68.9	56.8	76.7	70.1	65.8
BFCL Simple (Tool Calling)	92.3	78.6	81.3	71.8	88.8	70.3	72.2

Key Strengths

Exceptional Tool Calling: Achieves 92.3% on BFCL Simple benchmark, outperforming all comparison models
Strong Instruction Following: 78.9% on IFEval, demonstrating reliable adherence to complex instructions
Comprehensive Knowledge: 70.6% on MMLU, matching or exceeding larger models
Advanced Reasoning: 46.7% on GPQA Diamond, showing strong scientific reasoning capabilities
Multilingual Competency: Supports 10 languages with competitive performance

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "aquiffoo/aquif-3-mini"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
inputs = tokenizer("Explain quantum computing:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

License

Apache 2.0

Acknowledgements

We gratefully acknowledge:

Meta AI for the foundational Llama 3.2 architecture and pre-trained weights
Hugging Face for the transformers library and model hosting platform that enables easy access and deployment

For questions, issues, or collaboration opportunities, please reach out through the Hugging Face model page.