Llama 4 GGUFs now with multimodal (image) capabilities.

by shimmyshimmer - opened May 20

Discussion

shimmyshimmer

Unsloth AI org May 20

Thanks to a PR in llama.cpp!

shimmyshimmer pinned discussion May 20

mingyi456

May 21

The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?

shimmyshimmer

Unsloth AI org May 21

The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?

Oh dear! Good catch. We're gonna reupload. Can't believe it keeps happening rip

shimmyshimmer

Unsloth AI org May 22

The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?

Should be fixed now! @mingyi456

wellons

May 22

I was anticipating these files, too, and the fixed uploads work great. Thanks, @shimmyshimmer !

finding1

29 days ago

Has anyone successfully used vision capabilities with these files?

clip_init: failed to load model 'mmproj-F32.gguf': operator(): unable to find tensor mm.model.fc.weight

ggml_metal_free: deallocating
mtmd_init_from_file: error: Failed to load CLIP model from mmproj-F32.gguf

srv    load_model: failed to load multimodal model, 'mmproj-F32.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
ggml_metal_free: deallocating

This is with llama.cpp version 5586.

mingyi456

28 days ago

@finding1 Is this simply from trying to load the mmproj file? If so, it might be a problem with the metal backend? I tried version b5588 directly using llama-server, using the cuda backend (with partial cpu offloading) and it successfully loads the mmproj file. However, it seems to be unable to process certain images and crashes instead, while other images work. I am unsure what is it about those images that causes the crashes though.

find0x90

26 days ago

@finding1 I see the same thing using llama-b5604-bin-win-cuda-12.4-x64.

clip_init: failed to load model '~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf': operator(): unable to find tensor mm.model.fc.weight

mtmd_init_from_file: error: Failed to load CLIP model from ~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf

srv    load_model: failed to load multimodal model, '~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

Same thing happens for both mmproj-F16.gguf and mmproj-F32.gguf. I've been trying to switch from LM Studio to llama-swap + llama.cpp and, strangely, the GGUF files that run successfully with LM Studio show this error when run with llama.cpp. I tried downloading the model with the HF CLI and through LM Studio to make sure I was getting the same model, the files are identical. LM Studio uses ggml/llama.cpp but they don't use the command line, they compile and ship the dynamic libraries. So I'm unsure exactly how they are running this or if they have patches of their own applied.

Also Gemma 3 works for me with images under this setup, so it's not like it's completely broken.

shimmyshimmer unpinned discussion 16 days ago

shimmyshimmer

Unsloth AI org 16 days ago

There's currently an issue in llama.cpp where the vision component doesn't work anymore. We tried with ggml's quants as well and they dont work :(

danielhanchen

Unsloth AI org 15 days ago

@finding1 I fixed the mmproj - it was a llama.cpp bug.
@mingyi456 Yes I can repro - https://github.com/ggml-org/llama.cpp/pull/14247 should fix any images being allowed

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment