aquif-3-moe (17B) Thinking

A high-performance mixture-of-experts language model optimized for efficiency, coding, science, and general use. With 17B total parameters and 2.8B active parameters, aquif-3-moe delivers competitive performance across multiple domains while maintaining computational efficiency.

Model Details

Architecture: Mixture of Experts (MoE)
Total Parameters: 17 billion
Active Parameters: 2.8 billion
License: Apache 2.0
Library: transformers

Performance Benchmarks

Metric	aquif-3-moe (Thinking 17B a2.8B)	Phi-4 (Thinking 14B)	Qwen3 (Thinking 8B)	DeepSeek R1 (Qwen3 8B)	Magistral Small (24B)	Gemini 2.5 Flash-Lite (Propr.)
LiveCodeBench (Coding)	63.2	53.8	58.1	60.5	51.4	59.3
AIME 2024 (Math)	80.2	75.3	74.7	65.0	71.3	70.3
GPQA Diamond (Science)	64.2	65.8	62.0	61.1	64.1	62.5
Average	69.2	65.0	64.9	62.2	62.3	64.0

Key Strengths

Mathematical Reasoning: Achieves 91.4% on MATH-500, demonstrating exceptional mathematical problem-solving capabilities
Scientific Understanding: Leads in GPQA Diamond with 56.7%, showing strong scientific reasoning
Efficiency: Delivers competitive performance with only 2.8B active parameters
General Knowledge: Strong MMLU performance at 83.2%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "aquiffoo/aquif-3-moe-17b-a2.8b-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
inputs = tokenizer("Explain quantum entanglement:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Intended Use Cases

Mathematical problem solving and reasoning
Scientific research and analysis
Code generation and programming assistance
General question answering and text generation
Educational content creation

Model Architecture

The mixture-of-experts architecture enables efficient scaling by activating only a subset of parameters for each input, providing the benefits of a larger model while maintaining computational efficiency comparable to much smaller dense models.

License

Apache 2.0

aquiffoo
/

aquif-3-moe-17b-a2.8b-thinking

aquif-3-moe (17B) Thinking

Model Details

Performance Benchmarks

Key Strengths

Usage

Intended Use Cases

Model Architecture

License

Model tree for aquiffoo/aquif-3-moe-17b-a2.8b-thinking

Collections including aquiffoo/aquif-3-moe-17b-a2.8b-thinking

aquif-3

aquif-3-moe