File size: 916 Bytes

---
license: apache-2.0
language:
- en
pipeline_tag: image-text-to-text
tags:
- multimodal
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---

# Qwen2.5-VL-7B-Instruct

Converted and quantized using [HimariO's
fork](https://github.com/HimariO/llama.cpp.qwen2vl/tree/qwen25-vl) using [this
procedure](https://github.com/ggml-org/llama.cpp/issues/11483#issuecomment-2727577078).
No IMatrix.

The fork is currently required to run inference and there's no guarantee these checkpoints will work with future builds. Temporary builds are available [here](https://github.com/green-s/llama.cpp.qwen2vl/releases). The latest tested build as of writing is `qwen25-vl-b4899-bc4163b`.

[Original model](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct)

## Usage

```bash
./llama-qwen2vl-cli -m Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf --mmproj qwen2.5-vl-7b-instruct-vision.gguf -p "Please describe this image." --image ./image.jpg
```