vllm
File size: 5,944 Bytes
48919c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ecf6582
48919c0
 
 
 
 
 
 
a681e31
48919c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ecf6582
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
license: apache-2.0
language:
  - en
  - fr
  - de
  - es
  - pt
  - it
  - ja
  - ko
  - ru
  - zh
  - ar
  - fa
  - id
  - ms
  - ne
  - pl
  - ro
  - sr
  - sv
  - tr
  - uk
  - vi
  - hi
  - bn
base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
library_name: vllm
inference: false

---

# Aqui-VL 24B Mistral

Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind [AquiGPT](https://aquigpt.com.br). With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts.

## Key Features

- **Consumer Hardware Compatible**: Runs on single RTX 4090 or 32GB Mac
- **Multimodal Capabilities**: Text, vision, chart analysis, and document understanding
- **128K Context Window**: Handle long documents and complex conversations
- **Strong Instruction Following**: Significantly improved over base Mistral Small 3.1
- **Exceptional Code Generation**: Best-in-class coding performance

## Hardware Requirements

### Minimum Requirements
- **GPU**: RTX 4090 (24GB VRAM) or equivalent
- **Mac**: 32GB unified memory (Apple Silicon recommended)
- **RAM**: 32GB system memory (for GPU setups)
- **Storage**: 20GB available space (for model and overhead)

### Recommended Setup
- **GPU**: RTX 4090 with adequate cooling
- **CPU**: Modern multi-core processor
- **RAM**: 64GB+ for optimal performance
- **Storage**: NVMe SSD for faster model loading

## Performance Benchmarks

Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains:

| Benchmark | Aqui-VL 24B Mistral | Mistral Small 3.1 | Llama 3.1 70B |
|-----------|------------------|-------------------|----------------|
| **IFEval** (Instruction Following) | **88.3%** | 82.6% | 87.5% |
| **MMLU** (General Knowledge) | 80.9% | 80.5% | **86.0%** |
| **GPQA** (Science Q&A) | 44.7% | 44.4% | **46.7%** |
| **HumanEval** (Coding) | **92.5%** | 88.9% | 80.5% |
| **MATH** (Mathematics) | 69.3% | **69.5%** | 68.0% |
| **MMMU** (General Vision) | **64.0%** | 62.5% | N/A* |
| **ChartQA** (Chart Analysis) | **87.6%** | 86.2% | N/A* |
| **DocVQA** (Document Analysis) | **94.9%** | 94.1% | N/A* |
| **Average Text Performance** | **75.1%** | 73.2% | 73.7% |
| **Average Vision Performance** | **82.2%** | 80.9% | N/A* |

*Llama 3.1 70B does not include vision capabilities

## Model Specifications

- **Parameters**: 23.6 billion
- **Context Window**: 128,000 tokens
- **Knowledge Cutoff**: December 2023
- **Architecture**: mistral (transformer-based with vision)
- **Languages**: Multilingual support with strong English, French and Portuguese performance

## Installation & Usage

### Quick Start with Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "aquigpt/aqui-vl-24b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "Explain quantum computing in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Using with Ollama

```bash
# Pull the model
ollama pull aquiffoo/aqui-vl-24b

# Run interactive chat
ollama run aquiffoo/aqui-vl-24b
```

### Using with llama.cpp

```bash
# Download quantized model (Q4_K_M, 14.4GB)
wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf

# Run with llama.cpp
./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100
```

## Use Cases

### Code Generation & Programming
With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at:
- Writing clean, efficient code in multiple languages
- Debugging and code review
- Algorithm implementation
- Technical documentation

### Document & Chart Analysis
Strong vision capabilities enable:
- PDF document analysis and Q&A
- Chart and graph interpretation
- Scientific paper comprehension
- Business report analysis

### General Assistance
- Research and information synthesis
- Creative writing and content generation
- Mathematical problem solving
- Multilingual translation and communication

## Quantization

Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility:

- **Format**: Q4_K_M quantization
- **Size**: 14.4GB
- **VRAM Usage**: ~16GB (with overhead)
- **Compatible with**: RTX 4090, 32GB Mac, and similar hardware
- **Performance**: Excellent quality retention with 4-bit quantization

## Fine-tuning & Customization

Aqui-VL 24B Mistral supports:
- Parameter-efficient fine-tuning (LoRA, QLoRA)
- Full fine-tuning for specialized domains
- Custom tokenizer training
- Multi-modal fine-tuning for specific vision tasks

## Limitations

- Knowledge cutoff at December 2023
- May occasionally produce hallucinations
- Performance varies with quantization level
- Requires significant computational resources for optimal performance

## License

This model is released under the [Apache 2.0 License](LICENSE), making it suitable for both research and commercial applications.

## Support

For questions and support regarding Aqui-VL 24B Mistral, please visit the [Hugging Face repository](https://huggingface.co/aquigpt/aqui-vl-24b) and use the community discussions section.

## Acknowledgments

Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible.

---

*Copyright 2025 Aqui Solutions. All rights reserved*