--- license: apache-2.0 language: - en - fr - de - es - pt - it - ja - ko - ru - zh - ar - fa - id - ms - ne - pl - ro - sr - sv - tr - uk - vi - hi - bn base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503 library_name: vllm inference: false --- # Aqui-VL 24B Mistral Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind [AquiGPT](https://aquigpt.com.br). With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts. ## Key Features - **Consumer Hardware Compatible**: Runs on single RTX 4090 or 32GB Mac - **Multimodal Capabilities**: Text, vision, chart analysis, and document understanding - **128K Context Window**: Handle long documents and complex conversations - **Strong Instruction Following**: Significantly improved over base Mistral Small 3.1 - **Exceptional Code Generation**: Best-in-class coding performance ## Hardware Requirements ### Minimum Requirements - **GPU**: RTX 4090 (24GB VRAM) or equivalent - **Mac**: 32GB unified memory (Apple Silicon recommended) - **RAM**: 32GB system memory (for GPU setups) - **Storage**: 20GB available space (for model and overhead) ### Recommended Setup - **GPU**: RTX 4090 with adequate cooling - **CPU**: Modern multi-core processor - **RAM**: 64GB+ for optimal performance - **Storage**: NVMe SSD for faster model loading ## Performance Benchmarks Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains: | Benchmark | Aqui-VL 24B Mistral | Mistral Small 3.1 | Llama 3.1 70B | |-----------|------------------|-------------------|----------------| | **IFEval** (Instruction Following) | **88.3%** | 82.6% | 87.5% | | **MMLU** (General Knowledge) | 80.9% | 80.5% | **86.0%** | | **GPQA** (Science Q&A) | 44.7% | 44.4% | **46.7%** | | **HumanEval** (Coding) | **92.5%** | 88.9% | 80.5% | | **MATH** (Mathematics) | 69.3% | **69.5%** | 68.0% | | **MMMU** (General Vision) | **64.0%** | 62.5% | N/A* | | **ChartQA** (Chart Analysis) | **87.6%** | 86.2% | N/A* | | **DocVQA** (Document Analysis) | **94.9%** | 94.1% | N/A* | | **Average Text Performance** | **75.1%** | 73.2% | 73.7% | | **Average Vision Performance** | **82.2%** | 80.9% | N/A* | *Llama 3.1 70B does not include vision capabilities ## Model Specifications - **Parameters**: 23.6 billion - **Context Window**: 128,000 tokens - **Knowledge Cutoff**: December 2023 - **Architecture**: mistral (transformer-based with vision) - **Languages**: Multilingual support with strong English, French and Portuguese performance ## Installation & Usage ### Quick Start with Transformers ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "aquigpt/aqui-vl-24b" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" ) # Generate text prompt = "Explain quantum computing in simple terms:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=200, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Using with Ollama ```bash # Pull the model ollama pull aquiffoo/aqui-vl-24b # Run interactive chat ollama run aquiffoo/aqui-vl-24b ``` ### Using with llama.cpp ```bash # Download quantized model (Q4_K_M, 14.4GB) wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf # Run with llama.cpp ./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100 ``` ## Use Cases ### Code Generation & Programming With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at: - Writing clean, efficient code in multiple languages - Debugging and code review - Algorithm implementation - Technical documentation ### Document & Chart Analysis Strong vision capabilities enable: - PDF document analysis and Q&A - Chart and graph interpretation - Scientific paper comprehension - Business report analysis ### General Assistance - Research and information synthesis - Creative writing and content generation - Mathematical problem solving - Multilingual translation and communication ## Quantization Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility: - **Format**: Q4_K_M quantization - **Size**: 14.4GB - **VRAM Usage**: ~16GB (with overhead) - **Compatible with**: RTX 4090, 32GB Mac, and similar hardware - **Performance**: Excellent quality retention with 4-bit quantization ## Fine-tuning & Customization Aqui-VL 24B Mistral supports: - Parameter-efficient fine-tuning (LoRA, QLoRA) - Full fine-tuning for specialized domains - Custom tokenizer training - Multi-modal fine-tuning for specific vision tasks ## Limitations - Knowledge cutoff at December 2023 - May occasionally produce hallucinations - Performance varies with quantization level - Requires significant computational resources for optimal performance ## License This model is released under the [Apache 2.0 License](LICENSE), making it suitable for both research and commercial applications. ## Support For questions and support regarding Aqui-VL 24B Mistral, please visit the [Hugging Face repository](https://huggingface.co/aquigpt/aqui-vl-24b) and use the community discussions section. ## Acknowledgments Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible. --- *Copyright 2025 Aqui Solutions. All rights reserved*