aqui-vl-24b / README.md

Update README.md

a681e31 verified 8 days ago

5.94 kB

	---
	license: apache-2.0
	language:
	- en
	- fr
	- de
	- es
	- pt
	- it
	- ja
	- ko
	- ru
	- zh
	- ar
	- fa
	- id
	- ms
	- ne
	- pl
	- ro
	- sr
	- sv
	- tr
	- uk
	- vi
	- hi
	- bn
	base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
	library_name: vllm
	inference: false

	---

	# Aqui-VL 24B Mistral

	Aqui-VL 24B Mistral is an advanced language model based on Mistral Small 3.1, designed to deliver exceptional performance while remaining accessible on consumer-grade hardware. This is the first open weights model from Aqui Solutions, the company behind [AquiGPT](https://aquigpt.com.br). With 23.6 billion parameters, it can run efficiently on a single RTX 4090 GPU or a 32GB Mac, making cutting-edge AI capabilities available to researchers, developers, and enthusiasts.

	## Key Features

	- Consumer Hardware Compatible: Runs on single RTX 4090 or 32GB Mac
	- Multimodal Capabilities: Text, vision, chart analysis, and document understanding
	- 128K Context Window: Handle long documents and complex conversations
	- Strong Instruction Following: Significantly improved over base Mistral Small 3.1
	- Exceptional Code Generation: Best-in-class coding performance

	## Hardware Requirements

	### Minimum Requirements
	- GPU: RTX 4090 (24GB VRAM) or equivalent
	- Mac: 32GB unified memory (Apple Silicon recommended)
	- RAM: 32GB system memory (for GPU setups)
	- Storage: 20GB available space (for model and overhead)

	### Recommended Setup
	- GPU: RTX 4090 with adequate cooling
	- CPU: Modern multi-core processor
	- RAM: 64GB+ for optimal performance
	- Storage: NVMe SSD for faster model loading

	## Performance Benchmarks

	Aqui-VL 24B Mistral demonstrates competitive performance across multiple domains:

	\| Benchmark \| Aqui-VL 24B Mistral \| Mistral Small 3.1 \| Llama 3.1 70B \|
	\|-----------\|------------------\|-------------------\|----------------\|
	\| IFEval (Instruction Following) \| 88.3% \| 82.6% \| 87.5% \|
	\| MMLU (General Knowledge) \| 80.9% \| 80.5% \| 86.0% \|
	\| GPQA (Science Q&A) \| 44.7% \| 44.4% \| 46.7% \|
	\| HumanEval (Coding) \| 92.5% \| 88.9% \| 80.5% \|
	\| MATH (Mathematics) \| 69.3% \| 69.5% \| 68.0% \|
	\| MMMU (General Vision) \| 64.0% \| 62.5% \| N/A* \|
	\| ChartQA (Chart Analysis) \| 87.6% \| 86.2% \| N/A* \|
	\| DocVQA (Document Analysis) \| 94.9% \| 94.1% \| N/A* \|
	\| Average Text Performance \| 75.1% \| 73.2% \| 73.7% \|
	\| Average Vision Performance \| 82.2% \| 80.9% \| N/A* \|

	*Llama 3.1 70B does not include vision capabilities

	## Model Specifications

	- Parameters: 23.6 billion
	- Context Window: 128,000 tokens
	- Knowledge Cutoff: December 2023
	- Architecture: mistral (transformer-based with vision)
	- Languages: Multilingual support with strong English, French and Portuguese performance

	## Installation & Usage

	### Quick Start with Transformers

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load model and tokenizer
	model_name = "aquigpt/aqui-vl-24b"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Generate text
	prompt = "Explain quantum computing in simple terms:"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=200, temperature=0.7)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Using with Ollama

	```bash
	# Pull the model
	ollama pull aquiffoo/aqui-vl-24b

	# Run interactive chat
	ollama run aquiffoo/aqui-vl-24b
	```

	### Using with llama.cpp

	```bash
	# Download quantized model (Q4_K_M, 14.4GB)
	wget https://huggingface.co/aquigpt/aqui-vl-24b/resolve/main/aqui-vl-24b-q4_k_m.gguf

	# Run with llama.cpp
	./main -m aqui-vl-24b-q4_k_m.gguf -p "Your prompt here" -n 100
	```

	## Use Cases

	### Code Generation & Programming
	With an 88.9% score on HumanEval, Aqui-VL 24B Mistral excels at:
	- Writing clean, efficient code in multiple languages
	- Debugging and code review
	- Algorithm implementation
	- Technical documentation

	### Document & Chart Analysis
	Strong vision capabilities enable:
	- PDF document analysis and Q&A
	- Chart and graph interpretation
	- Scientific paper comprehension
	- Business report analysis

	### General Assistance
	- Research and information synthesis
	- Creative writing and content generation
	- Mathematical problem solving
	- Multilingual translation and communication

	## Quantization

	Aqui-VL 24B Mistral is available exclusively in Q4_K_M quantization, optimized for the best balance of performance and hardware compatibility:

	- Format: Q4_K_M quantization
	- Size: 14.4GB
	- VRAM Usage: ~16GB (with overhead)
	- Compatible with: RTX 4090, 32GB Mac, and similar hardware
	- Performance: Excellent quality retention with 4-bit quantization

	## Fine-tuning & Customization

	Aqui-VL 24B Mistral supports:
	- Parameter-efficient fine-tuning (LoRA, QLoRA)
	- Full fine-tuning for specialized domains
	- Custom tokenizer training
	- Multi-modal fine-tuning for specific vision tasks

	## Limitations

	- Knowledge cutoff at December 2023
	- May occasionally produce hallucinations
	- Performance varies with quantization level
	- Requires significant computational resources for optimal performance

	## License

	This model is released under the [Apache 2.0 License](LICENSE), making it suitable for both research and commercial applications.

	## Support

	For questions and support regarding Aqui-VL 24B Mistral, please visit the [Hugging Face repository](https://huggingface.co/aquigpt/aqui-vl-24b) and use the community discussions section.

	## Acknowledgments

	Built upon the excellent foundation of Mistral Small 3.1 by Mistral AI. Special thanks to the open-source community for tools and datasets that made this model possible.

	---

	Copyright 2025 Aqui Solutions. All rights reserved