aquiffoo's picture
Update README.md
7014324 verified
metadata
pipeline_tag: text-generation
inference: false
license: mit
library_name: transformers
tags:
  - llm
  - aquif
  - text-generation-inference
  - foundational
  - moe
  - aquif-AlphaMoE
  - aquif-3.5
language:
  - en

aquif-AlphaMoE

aquif-AlphaMoE is the first foundational model designed entirely by aquif AI, marking a shift from third-party based architectures (used in aquif-3 and aquif-3.5) toward an in-house architecture family. Released on October 1, 2025, AlphaMoE debuts the AquifAlphaMoEForCausalLM design, a scalable Mixture of Experts (MoE) framework that balances efficiency, reasoning, and multilingual capability.

This release represents aquif AI’s first step into independent foundational model architecture design, with a focus on modular expert scaling, long-context performance, and efficient parameter utilization.

Model Repository Links

Model HuggingFace Repository
aquif-AlphaMoE-7.5B-A3B aquif-ai/aquif-AlphaMoE-7.5B-A3B

Model Overview

Model Total Params (B) Active Params (B) Experts (Total / Active) Context Attention Vocab Size MMLU GPQA-D LiveCodeBench Math-500 Average
aquif-AlphaMoE-7.5B-A3B 7.47 2.92 64 / 4 164k GQA (16 heads) 128k 86.7 60.1 35.9 87.3 67.5

Performance Comparison

Metric AlphaMoE (7.5B A3B) aquif-3-moe (17B A2.8B) Ling-mini-2.0 (16B A1.4B) Qwen3-Instruct-2507 (4B) aquif-3.5 (7.3B) Granite-4.0-HS (32B A9B) Gemma-3 (12.2B)
MMLU 84.3 83.2 80.9 81.6 78.5 78.5 78.5
GPQA-Diamond 57.5 56.7 54.3 49.6 42.3 41.6 34.9
LiveCodeBench 35.9 28.6 34.8 31.9 21.3 25.1 13.7
Math-500 87.3 91.4 89.4 84.4 90.2 85.4 82.4
Average 66.3 65.0 64.9 61.9 58.1 57.7 52.4

Key Features

  • First Foundational Architecture: Designed from scratch by aquif AI, unlike aquif-3 and 3.5 which relied on third-party bases.
  • Scalable MoE Design: 64 total experts with 4 active per token, enabling dynamic compute allocation.
  • High Efficiency: 7.47B total parameters but only 2.92B active, delivering strong performance-to-compute ratios.
  • Extended Context: 164k token context window for long-form reasoning and document handling.
  • Strong Benchmarks: Surpasses previous aquif generations and peer models in general knowledge, science, and code tasks.
  • Multilingual Support: Optimized for 10+ major languages, ensuring broad usability.

Technical Specifications

  • Architecture Name: AquifAlphaMoEForCausalLM
  • Total Parameters: 7.47B
  • Active Parameters: 2.92B
  • Total Experts: 64
  • Active Experts: 4
  • Context Window: 164k tokens
  • Attention Mechanism: GQA with 16 heads
  • Vocabulary Size: 128k
  • Supported Precisions: FP16, BF16

License

This project is released under the MIT (prev. Apache 2.0) license. See LICENSE file for details.


Made in 🇧🇷

© 2025 aquif AI. All rights reserved.