aquif-3
Collection
Our most advanced models for 2025.
•
6 items
•
Updated
A high-performance mixture-of-experts language model optimized for efficiency, coding, science, and general use. With 17B total parameters and 2.8B active parameters, aquif-3-moe delivers competitive performance across multiple domains while maintaining computational efficiency.
Architecture: Mixture of Experts (MoE)
Total Parameters: 17 billion
Active Parameters: 2.8 billion
License: Apache 2.0
Library: transformers
Metric | aquif-3-moe (Thinking 17B a2.8B) | Phi-4 (Thinking 14B) | Qwen3 (Thinking 8B) | DeepSeek R1 (Qwen3 8B) | Magistral Small (24B) | Gemini 2.5 Flash-Lite (Propr.) |
---|---|---|---|---|---|---|
LiveCodeBench (Coding) | 63.2 | 53.8 | 58.1 | 60.5 | 51.4 | 59.3 |
AIME 2024 (Math) | 80.2 | 75.3 | 74.7 | 65.0 | 71.3 | 70.3 |
GPQA Diamond (Science) | 64.2 | 65.8 | 62.0 | 61.1 | 64.1 | 62.5 |
Average | 69.2 | 65.0 | 64.9 | 62.2 | 62.3 | 64.0 |
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "aquiffoo/aquif-3-moe-17b-a2.8b-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
inputs = tokenizer("Explain quantum entanglement:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
The mixture-of-experts architecture enables efficient scaling by activating only a subset of parameters for each input, providing the benefits of a larger model while maintaining computational efficiency comparable to much smaller dense models.
Apache 2.0