aquif-moe-400m

aquif-moe-400m is our compact Mixture of Experts (MoE) model, with only 400 million active parameters. It offers impressive performance-per-VRAM efficiency, making it a strong choice for resource-limited setups.

Model Overview

Name: aquif-moe-400m
Parameters: 400 million active parameters (1.3 billion total)
Context Window: 128,000 tokens
Architecture: Mixture of Experts (MoE)
Type: General-purpose LLM
Hosted on: Ollama, Huggingface

Key Features

Highly efficient VRAM utilization (77.0 performance points per GB)
Expansive 128K token context window for handling long documents
Competitive performance despite fewer parameters
Optimized for local inference on consumer hardware
Ideal for resource-constrained environments
Supports high-throughput concurrent sessions

Performance Benchmarks

aquif-moe-400m delivers solid performance across multiple benchmarks, especially for its size:

Benchmark	aquif-moe (0.4b)	Qwen 2.5 (0.5b)	Gemma 3 (1b)
MMLU	26.6	45.4	26.5
HumanEval	32.3	22.0	8.1
GSM8K	33.9	36.0	6.1
Average	30.9	34.4	11.3

VRAM Efficiency

aquif-moe-400m excels in VRAM efficiency:

Model	Average Performance	VRAM (GB)	Performance per VRAM
aquif-moe	30.9	0.4	77.3
Qwen 2.5	34.4	0.6	57.3
Gemma 3	11.3	1.0	11.3

Use Cases

Edge computing and resource-constrained environments
Mobile and embedded applications
Local development environments
Quick prototyping and testing
Personal assistants on consumer hardware
Enterprise deployment with multiple concurrent sessions
Long document analysis and summarization
High-throughput production environments

Limitations

No thinking mode capability
May show hallucinations in some areas
May struggle with more complex reasoning tasks
Not optimized for specialized domains

Getting Started

To run via Ollama:

ollama run aquiffoo/aquif-moe-400m
``´

aquiffoo
/

aquif-moe-400m