Cannot input image in ollama for gemma-3-27b-it-GGUF:Q4_K_M
Model variant: gemma-3-27b-it-GGUF:Q4_K_M
I am hosting the model in ollama and use python API to send request to the model
from ollama import Client
client = Client(host=host)
response = client.chat(
model=model,
messages=[
{
"role": "user",
"content": "Write the text in the image",
"images": [image_path]
}
]
)
return response['message']['content']
It raise the error
ollama._types.ResponseError: Failed to create new sequence: failed to process inputs: this model is missing data required for image input
(status code: 500)
I think the error is raised when the model is not support image input.
+1
I'm having the same issue with the 4b version, I think it's something to do with the vision element not being properly linked to the model, but it is a little beyond my skillset to resolve that. The standard quantized versions hosted on ollama work, so it must be something to do with how it's configured here.
Do you guys know if it works on llama.cpp? :)
the same issue here
Yes, it works with llama.cpp(text only)
I tried 4B version
+1
curl http://localhost:11434/api/chat -d '{
"model": "gemma-3-27b-it-GGUF:Q4_K_M",
"messages": [{
"role": "user",
"content": "what is in this image?",
"images": ["'"$(base64 -w 0 {(The_image_work_on_standard_version)} )"'"]
}]
}'
{"error":"Failed to create new sequence: failed to process inputs: this model is missing data required for image input\n"}
Didn't look deeper yet but I believe all the "unslothed" versions of any model are only capable of text.
+1
Didn't look deeper yet but I believe all the "unslothed" versions of any model are only capable of text.
According their blog, a visual capability was fixed. But it wasn't. Checked 27b Q4_K_M model with tensor parallelization in llama.cpp and ollama (native ollama quantized model works well). Vllm doesn't work at all. Perhaps, there was missed something on my side.
@eddited:
Apologies, that was my mistake. I was testing with this model on llama.cpp-server, but it doesn't currently support multimodality. Unfortunately, vllm also doesn't function with Gemma 3 in GGUF format, meaning this model is currently limited to local, multimodal testing.
I would appreciate it if anyone has experience running this in multimodal mode on a server and could share their insights.
We're also looking forward to the new release of Unsloth's inference engine, which is expected to support multi-GPU configurations.
For Ollama I'm unsure exactly what the issue is but I asked the Ollama folks, they have a unique way of doing GGUFs as they integrate the mmproj intoi the actual file so unfortunately there's nothing we can do about it if it still doesn't work. Hopefully they will support separate mmproj files in the future :(
It works on other places like llama.cpp etc
CC: @AndyNeSH @Jabarton @kitc @milankowww @nchatu @otacilio-psf @zcfrank1st