aquif-3-moe (17B) Thinking

A high-performance mixture-of-experts language model optimized for efficiency, coding, science, and general use. With 17B total parameters and 2.8B active parameters, aquif-3-moe delivers competitive performance across multiple domains while maintaining computational efficiency.

Model Details

Architecture: Mixture of Experts (MoE)
Total Parameters: 17 billion
Active Parameters: 2.8 billion
License: Apache 2.0
Library: transformers

Performance Benchmarks

Benchmark Comparison Chart
Metric aquif-3-moe (Thinking 17B a2.8B) Phi-4 (Thinking 14B) Qwen3 (Thinking 8B) DeepSeek R1 (Qwen3 8B) Magistral Small (24B) Gemini 2.5 Flash-Lite (Propr.)
LiveCodeBench (Coding) 63.2 53.8 58.1 60.5 51.4 59.3
AIME 2024 (Math) 80.2 75.3 74.7 65.0 71.3 70.3
GPQA Diamond (Science) 64.2 65.8 62.0 61.1 64.1 62.5
Average 69.2 65.0 64.9 62.2 62.3 64.0

Key Strengths

  • Mathematical Reasoning: Achieves 91.4% on MATH-500, demonstrating exceptional mathematical problem-solving capabilities
  • Scientific Understanding: Leads in GPQA Diamond with 56.7%, showing strong scientific reasoning
  • Efficiency: Delivers competitive performance with only 2.8B active parameters
  • General Knowledge: Strong MMLU performance at 83.2%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "aquiffoo/aquif-3-moe-17b-a2.8b-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
inputs = tokenizer("Explain quantum entanglement:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Intended Use Cases

  • Mathematical problem solving and reasoning
  • Scientific research and analysis
  • Code generation and programming assistance
  • General question answering and text generation
  • Educational content creation

Model Architecture

The mixture-of-experts architecture enables efficient scaling by activating only a subset of parameters for each input, providing the benefits of a larger model while maintaining computational efficiency comparable to much smaller dense models.

License

Apache 2.0

Downloads last month
13
Safetensors
Model size
16.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquiffoo/aquif-3-moe-17b-a2.8b-thinking

Finetuned
(1)
this model
Quantizations
2 models

Collections including aquiffoo/aquif-3-moe-17b-a2.8b-thinking