Is it vision capable?
Does this model support vision capabilities?
The model was actually designed specifically for vision, but it's not working with images for me.
Regards and thy
Yes it for sure is vision capable. We even offer 2 different quants of the vision tensors: https://huggingface.co/mradermacher/Qwen2.5-VL-32B-Instruct-abliterated-GGUF/blob/main/Qwen2.5-VL-32B-Instruct-abliterated.mmproj-f16.gguf and https://huggingface.co/mradermacher/Qwen2.5-VL-32B-Instruct-abliterated-GGUF/blob/main/Qwen2.5-VL-32B-Instruct-abliterated.mmproj-Q8_0.gguf. So how you use the vision capability in llama.cpp is by providing any quant you like for the LLM layers and MMPROJ file for the non-LLM layers like vision or audio. llama-server supports all this multimodal capabilities using a nice and intuitive user interface: https://github.com/ggml-org/llama.cpp/tree/master/tools/server
Thanks for the information and clarification! Just to clarify: I'm purely a "user" without any real skills ;-)
My observation: I'm using the model via Ollama and Open WebUI. When I upload an image in the chat, the GPU memory starts to fill up as expected (RTX 4090 and AMD 5800X3D), but then nothing happens...
I also tested with "hf.co/mradermacher/Qwen2.5-VL-32B-Instruct-abliterated-GGUF:Q3_K_S" to rule out GPU memory size issues. (Just for testing: it works with Gemma3.)
The vision head wasnt included by huihui and embedded in the quant like some vision models are.. use the projector as previously mentioned.. ollama AFAIK may not allow.you to.specify a projector in the yaml...ymmv
This should be the solution for the model creators to support vision models in Ollama ?!
https://github.com/ollama/ollama/pull/11163/files
In February which was like 500 AI years ago they were still testing vision/audio capable models in llama.cpp from my recollection. I think Ollama has had vision capability (moondream2) since around 0.1.3 (over a year). That commit supports gguf where I believe before it was just Diffusers. That's a good thing for 32B, you should be able to fit Q6 on a 24gb GPu. The projectors are small 1+ gig for f16 and half that for q8. IMHO it's easier just to write the Python. HF provide so much sample code it makes it super easy. In fact Transformers is moving so fast that I believe most quantization options are baked into ..I know torchao for sure- I won't be sad to drop lmdeploy..
You shoudl easily be able to run Gemma3-27b with your 4090. I run Q4km with good results but could probably get to 6 bits without much effort. It's my default general model on my LLM machine and I frequently find it more useful and stable than whatever "o" model OAI is pawning right now. The only other model that gets close to this quality that I've used is Intern72B and the OpenGVLab/InternVL2_5-38B-MPO which seems to work well with lmdeploy and Turbomind etc from my recollection
To set the facts straight, llama.cpp has had vision support for a much longer time. And ollama is based on llama.cpp.
I imported this model in Ollama via Modelfile. When i run the model from terminal in Ollama it works. But when I try to runn it via API it doesn't. did someone get it to work via API? If yes, please can you give an example for the Ollama Modelfile?