vllm
aqui-vl-24b / README.md
aquiffoo's picture
Update README.md
a681e31 verified
---
license: apache-2.0
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
library_name: vllm
inference: false
---
# Aqui-VL 24B Mistral
Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind [AquiGPT](https://aquigpt.com.br). With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts.
## Key Features
- **Consumer Hardware Compatible**: Runs on single RTX 4090 or 32GB Mac
- **Multimodal Capabilities**: Text, vision, chart analysis, and document understanding
- **128K Context Window**: Handle long documents and complex conversations
- **Strong Instruction Following**: Significantly improved over base Mistral Small 3.1
- **Exceptional Code Generation**: Best-in-class coding performance
## Hardware Requirements
### Minimum Requirements
- **GPU**: RTX 4090 (24GB VRAM) or equivalent
- **Mac**: 32GB unified memory (Apple Silicon recommended)
- **RAM**: 32GB system memory (for GPU setups)
- **Storage**: 20GB available space (for model and overhead)
### Recommended Setup
- **GPU**: RTX 4090 with adequate cooling
- **CPU**: Modern multi-core processor
- **RAM**: 64GB+ for optimal performance
- **Storage**: NVMe SSD for faster model loading
## Performance Benchmarks
Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains:
| Benchmark | Aqui-VL 24B Mistral | Mistral Small 3.1 | Llama 3.1 70B |
|-----------|------------------|-------------------|----------------|
| **IFEval** (Instruction Following) | **88.3%** | 82.6% | 87.5% |
| **MMLU** (General Knowledge) | 80.9% | 80.5% | **86.0%** |
| **GPQA** (Science Q&A) | 44.7% | 44.4% | **46.7%** |
| **HumanEval** (Coding) | **92.5%** | 88.9% | 80.5% |
| **MATH** (Mathematics) | 69.3% | **69.5%** | 68.0% |
| **MMMU** (General Vision) | **64.0%** | 62.5% | N/A* |
| **ChartQA** (Chart Analysis) | **87.6%** | 86.2% | N/A* |
| **DocVQA** (Document Analysis) | **94.9%** | 94.1% | N/A* |
| **Average Text Performance** | **75.1%** | 73.2% | 73.7% |
| **Average Vision Performance** | **82.2%** | 80.9% | N/A* |
*Llama 3.1 70B does not include vision capabilities
## Model Specifications
- **Parameters**: 23.6 billion
- **Context Window**: 128,000 tokens
- **Knowledge Cutoff**: December 2023
- **Architecture**: mistral (transformer-based with vision)
- **Languages**: Multilingual support with strong English, French and Portuguese performance
## Installation & Usage
### Quick Start with Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "aquigpt/aqui-vl-24b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Generate text
prompt = "Explain quantum computing in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Using with Ollama
```bash
# Pull the model
ollama pull aquiffoo/aqui-vl-24b
# Run interactive chat
ollama run aquiffoo/aqui-vl-24b
```
### Using with llama.cpp
```bash
# Download quantized model (Q4_K_M, 14.4GB)
wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf
# Run with llama.cpp
./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100
```
## Use Cases
### Code Generation & Programming
With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at:
- Writing clean, efficient code in multiple languages
- Debugging and code review
- Algorithm implementation
- Technical documentation
### Document & Chart Analysis
Strong vision capabilities enable:
- PDF document analysis and Q&A
- Chart and graph interpretation
- Scientific paper comprehension
- Business report analysis
### General Assistance
- Research and information synthesis
- Creative writing and content generation
- Mathematical problem solving
- Multilingual translation and communication
## Quantization
Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility:
- **Format**: Q4_K_M quantization
- **Size**: 14.4GB
- **VRAM Usage**: ~16GB (with overhead)
- **Compatible with**: RTX 4090, 32GB Mac, and similar hardware
- **Performance**: Excellent quality retention with 4-bit quantization
## Fine-tuning & Customization
Aqui-VL 24B Mistral supports:
- Parameter-efficient fine-tuning (LoRA, QLoRA)
- Full fine-tuning for specialized domains
- Custom tokenizer training
- Multi-modal fine-tuning for specific vision tasks
## Limitations
- Knowledge cutoff at December 2023
- May occasionally produce hallucinations
- Performance varies with quantization level
- Requires significant computational resources for optimal performance
## License
This model is released under the [Apache 2.0 License](LICENSE), making it suitable for both research and commercial applications.
## Support
For questions and support regarding Aqui-VL 24B Mistral, please visit the [Hugging Face repository](https://huggingface.co/aquigpt/aqui-vl-24b) and use the community discussions section.
## Acknowledgments
Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible.
---
*Copyright 2025 Aqui Solutions. All rights reserved*