aquif-3-moe (17B)

A high-performance mixture-of-experts language model optimized for efficiency, coding, science, and general use. With 17B total parameters and 2.8B active parameters, aquif-3-moe delivers competitive performance across multiple domains while maintaining computational efficiency.

Model Details

Architecture: Mixture of Experts (MoE)
Total Parameters: 17 billion
Active Parameters: 2.8 billion
License: Apache 2.0
Library: transformers

Performance Benchmarks

Benchmark Comparison Chart
Metric aquif-3-moe (17B a2.8B) Phi-4 (14B) Qwen3 (14B) Gemma 3 (27B) GPT-4.1 nano (Propr.) Mistral Small 3.2 (24B)
MMLU (General Knowledge) 83.2 84.8 82.0 78.6 80.1 80.5
LiveCodeBench (Coding) 28.6 25.2 29.0 26.9 32.6 27.5
MATH-500 (Math) 91.4 80.8 89.8 88.3 84.8 88.3
GPQA Diamond (Science) 56.7 56.1 54.8 42.8 50.3 50.5
Average 65.0 61.7 63.9 59.2 62.0 61.7

Key Strengths

  • Mathematical Reasoning: Achieves 91.4% on MATH-500, demonstrating exceptional mathematical problem-solving capabilities
  • Scientific Understanding: Leads in GPQA Diamond with 56.7%, showing strong scientific reasoning
  • Efficiency: Delivers competitive performance with only 2.8B active parameters
  • General Knowledge: Strong MMLU performance at 83.2%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "aquif/aquif-3-moe-17b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
inputs = tokenizer("Explain quantum entanglement:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Intended Use Cases

  • Mathematical problem solving and reasoning
  • Scientific research and analysis
  • Code generation and programming assistance
  • General question answering and text generation
  • Educational content creation

Model Architecture

The mixture-of-experts architecture enables efficient scaling by activating only a subset of parameters for each input, providing the benefits of a larger model while maintaining computational efficiency comparable to much smaller dense models.

License

Apache 2.0

Downloads last month
15
Safetensors
Model size
16.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquiffoo/aquif-3-moe-17b-a2.8b

Finetuned
(2)
this model
Finetunes
1 model
Quantizations
3 models

Collections including aquiffoo/aquif-3-moe-17b-a2.8b