Does not support multimodal input

by RamboRogers - opened Apr 4

Discussion

RamboRogers

Apr 4

Is there a way to make this mulitmodal like Gemma3 is?

shimmyshimmer

Unsloth AI org Apr 6

Is there a way to make this mulitmodal like Gemma3 is?

Our upload does support multimodal.What are you using this on?

kallortz

Apr 7

Is there a way to make this mulitmodal like Gemma3 is?

kallortz

Apr 7

•

edited Apr 7

When trying to use the following conversation structure on unsloth/gemma-3-27b-it-unsloth-bnb-4bit

 
messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Describe this image ."}
        ]
    }
]

The model returns the following:

 
["user\nYou are a helpful assistant.\n\nDescribe this image.\nmodel\nOkay, let's describe the image!\n\nThe image shows a cozy and inviting living room scene. Here's a breakdown of what I see:\n\n*   **Setting:** It appears to be a living room, likely in a home. The style is warm and inviting, with a focus on comfort.\n"]

While the content of the image is completely different. I also tried to give different URLs as input, but the answer is identical ("The image shows a cozy and inviting...")

shimmyshimmer

Unsloth AI org Apr 10

Is there a way to make this mulitmodal like Gemma3 is?

+1

where are you using this on?

RamboRogers

Apr 20

Ollama

RamboRogers

Apr 20

Maybe it's something that needs to be set in Ollama?

shimmyshimmer

Unsloth AI org 28 days ago

For Ollama I'm unsure exactly what the issue is but I asked the Ollama folks, they have a unique way of doing GGUFs as they integrate the mmproj intoi the actual file so unfortunately there's nothing we can do about it if it still doesn't work. Hopefully they will support separate mmproj files in the future :(

It works on other places like llama.cpp etc

CC: @kallortz @RamboRogers

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment