Cannot input image in ollama for gemma-3-27b-it-GGUF:Q4_K_M

by kitc - opened Mar 18

kitc

Mar 18

•

Model variant: gemma-3-27b-it-GGUF:Q4_K_M
I am hosting the model in ollama and use python API to send request to the model

from ollama import Client
client = Client(host=host)
response = client.chat(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Write the text in the image",
            "images": [image_path]
        }
    ]
)
return response['message']['content']

It raise the error

ollama._types.ResponseError: Failed to create new sequence: failed to process inputs: this model is missing data required for image input
 (status code: 500)

I think the error is raised when the model is not support image input.

zcfrank1st

Mar 19

ocarson

Mar 19

I'm having the same issue with the 4b version, I think it's something to do with the vision element not being properly linked to the model, but it is a little beyond my skillset to resolve that. The standard quantized versions hosted on ollama work, so it must be something to do with how it's configured here.

shimmyshimmer

Unsloth AI org Mar 19

Do you guys know if it works on llama.cpp? :)

Jabarton

Mar 20

the same issue here

nchatu

Mar 20

Yes, it works with llama.cpp(text only)
I tried 4B version

Yehudi-dev

Mar 21

•

edited Mar 21

+1

curl http://localhost:11434/api/chat -d '{
"model": "gemma-3-27b-it-GGUF:Q4_K_M",
"messages": [{
"role": "user",
"content": "what is in this image?",
"images": ["'"$(base64 -w 0 {(The_image_work_on_standard_version)} )"'"]
}]
}'
{"error":"Failed to create new sequence: failed to process inputs: this model is missing data required for image input\n"}

milankowww

Mar 21

Didn't look deeper yet but I believe all the "unslothed" versions of any model are only capable of text.

otacilio-psf

Mar 21

AndyNeSH

Mar 21

•

edited Mar 23

Didn't look deeper yet but I believe all the "unslothed" versions of any model are only capable of text.

According their blog, a visual capability was fixed. But it wasn't. Checked 27b Q4_K_M model with tensor parallelization in llama.cpp and ollama (native ollama quantized model works well). Vllm doesn't work at all. Perhaps, there was missed something on my side.

@eddited:
Apologies, that was my mistake. I was testing with this model on llama.cpp-server, but it doesn't currently support multimodality. Unfortunately, vllm also doesn't function with Gemma 3 in GGUF format, meaning this model is currently limited to local, multimodal testing.
I would appreciate it if anyone has experience running this in multimodal mode on a server and could share their insights.
We're also looking forward to the new release of Unsloth's inference engine, which is expected to support multi-GPU configurations.

shimmyshimmer

Unsloth AI org 28 days ago

For Ollama I'm unsure exactly what the issue is but I asked the Ollama folks, they have a unique way of doing GGUFs as they integrate the mmproj intoi the actual file so unfortunately there's nothing we can do about it if it still doesn't work. Hopefully they will support separate mmproj files in the future :(

It works on other places like llama.cpp etc

CC: @AndyNeSH @Jabarton @kitc @milankowww @nchatu @otacilio-psf @zcfrank1st

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment