File size: 5,944 Bytes

---
license: apache-2.0
language:
  - en
  - fr
  - de
  - es
  - pt
  - it
  - ja
  - ko
  - ru
  - zh
  - ar
  - fa
  - id
  - ms
  - ne
  - pl
  - ro
  - sr
  - sv
  - tr
  - uk
  - vi
  - hi
  - bn
base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
library_name: vllm
inference: false

---

# Aqui-VL 24B Mistral

Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind [AquiGPT](https://aquigpt.com.br). With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts.

## Key Features

- **Consumer Hardware Compatible**: Runs on single RTX 4090 or 32GB Mac
- **Multimodal Capabilities**: Text, vision, chart analysis, and document understanding
- **128K Context Window**: Handle long documents and complex conversations
- **Strong Instruction Following**: Significantly improved over base Mistral Small 3.1
- **Exceptional Code Generation**: Best-in-class coding performance

## Hardware Requirements

### Minimum Requirements
- **GPU**: RTX 4090 (24GB VRAM) or equivalent
- **Mac**: 32GB unified memory (Apple Silicon recommended)
- **RAM**: 32GB system memory (for GPU setups)
- **Storage**: 20GB available space (for model and overhead)

### Recommended Setup
- **GPU**: RTX 4090 with adequate cooling
- **CPU**: Modern multi-core processor
- **RAM**: 64GB+ for optimal performance
- **Storage**: NVMe SSD for faster model loading

## Performance Benchmarks

Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains:

| Benchmark | Aqui-VL 24B Mistral | Mistral Small 3.1 | Llama 3.1 70B |
|-----------|------------------|-------------------|----------------|
| **IFEval** (Instruction Following) | **88.3%** | 82.6% | 87.5% |
| **MMLU** (General Knowledge) | 80.9% | 80.5% | **86.0%** |
| **GPQA** (Science Q&A) | 44.7% | 44.4% | **46.7%** |
| **HumanEval** (Coding) | **92.5%** | 88.9% | 80.5% |
| **MATH** (Mathematics) | 69.3% | **69.5%** | 68.0% |
| **MMMU** (General Vision) | **64.0%** | 62.5% | N/A* |
| **ChartQA** (Chart Analysis) | **87.6%** | 86.2% | N/A* |
| **DocVQA** (Document Analysis) | **94.9%** | 94.1% | N/A* |
| **Average Text Performance** | **75.1%** | 73.2% | 73.7% |
| **Average Vision Performance** | **82.2%** | 80.9% | N/A* |

*Llama 3.1 70B does not include vision capabilities

## Model Specifications

- **Parameters**: 23.6 billion
- **Context Window**: 128,000 tokens
- **Knowledge Cutoff**: December 2023
- **Architecture**: mistral (transformer-based with vision)
- **Languages**: Multilingual support with strong English, French and Portuguese performance

## Installation & Usage

### Quick Start with Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/aqui-vl-24b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "Explain quantum computing in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Using with Ollama

```bash
# Pull the model
ollama pull aquiffoo/aqui-vl-24b

# Run interactive chat
ollama run aquiffoo/aqui-vl-24b
```

### Using with llama.cpp

```bash
# Download quantized model (Q4_K_M, 14.4GB)
wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf

# Run with llama.cpp
./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100
```

## Use Cases

### Code Generation & Programming
With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at:
- Writing clean, efficient code in multiple languages
- Debugging and code review
- Algorithm implementation
- Technical documentation

### Document & Chart Analysis
Strong vision capabilities enable:
- PDF document analysis and Q&A
- Chart and graph interpretation
- Scientific paper comprehension
- Business report analysis

### General Assistance
- Research and information synthesis
- Creative writing and content generation
- Mathematical problem solving
- Multilingual translation and communication

## Quantization

Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility:

- **Format**: Q4_K_M quantization
- **Size**: 14.4GB
- **VRAM Usage**: ~16GB (with overhead)
- **Compatible with**: RTX 4090, 32GB Mac, and similar hardware
- **Performance**: Excellent quality retention with 4-bit quantization

## Fine-tuning & Customization

Aqui-VL 24B Mistral supports:
- Parameter-efficient fine-tuning (LoRA, QLoRA)
- Full fine-tuning for specialized domains
- Custom tokenizer training
- Multi-modal fine-tuning for specific vision tasks

## Limitations

- Knowledge cutoff at December 2023
- May occasionally produce hallucinations
- Performance varies with quantization level
- Requires significant computational resources for optimal performance

## License

This model is released under the [Apache 2.0 License](LICENSE), making it suitable for both research and commercial applications.

## Support

For questions and support regarding Aqui-VL 24B Mistral, please visit the [Hugging Face repository](https://huggingface.co/aquigpt/aqui-vl-24b) and use the community discussions section.

## Acknowledgments

Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible.

---

*Copyright 2025 Aqui Solutions. All rights reserved*