Llama 4 GGUFs now with multimodal (image) capabilities.

#5
by shimmyshimmer - opened
Unsloth AI org

Thanks to a PR in llama.cpp!

shimmyshimmer pinned discussion

The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?

Unsloth AI org

The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?

Oh dear! Good catch. We're gonna reupload. Can't believe it keeps happening rip

Unsloth AI org

The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?

Should be fixed now! @mingyi456

I was anticipating these files, too, and the fixed uploads work great. Thanks, @shimmyshimmer !

Has anyone successfully used vision capabilities with these files?

clip_init: failed to load model 'mmproj-F32.gguf': operator(): unable to find tensor mm.model.fc.weight

ggml_metal_free: deallocating
mtmd_init_from_file: error: Failed to load CLIP model from mmproj-F32.gguf

srv    load_model: failed to load multimodal model, 'mmproj-F32.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
ggml_metal_free: deallocating

This is with llama.cpp version 5586.

@finding1 Is this simply from trying to load the mmproj file? If so, it might be a problem with the metal backend? I tried version b5588 directly using llama-server, using the cuda backend (with partial cpu offloading) and it successfully loads the mmproj file. However, it seems to be unable to process certain images and crashes instead, while other images work. I am unsure what is it about those images that causes the crashes though.

@finding1 I see the same thing using llama-b5604-bin-win-cuda-12.4-x64.

clip_init: failed to load model '~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf': operator(): unable to find tensor mm.model.fc.weight

mtmd_init_from_file: error: Failed to load CLIP model from ~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf

srv    load_model: failed to load multimodal model, '~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

Same thing happens for both mmproj-F16.gguf and mmproj-F32.gguf. I've been trying to switch from LM Studio to llama-swap + llama.cpp and, strangely, the GGUF files that run successfully with LM Studio show this error when run with llama.cpp. I tried downloading the model with the HF CLI and through LM Studio to make sure I was getting the same model, the files are identical. LM Studio uses ggml/llama.cpp but they don't use the command line, they compile and ship the dynamic libraries. So I'm unsure exactly how they are running this or if they have patches of their own applied.

Also Gemma 3 works for me with images under this setup, so it's not like it's completely broken.

shimmyshimmer unpinned discussion
Unsloth AI org

There's currently an issue in llama.cpp where the vision component doesn't work anymore. We tried with ggml's quants as well and they dont work :(

Unsloth AI org

@finding1 I fixed the mmproj - it was a llama.cpp bug.
@mingyi456 Yes I can repro - https://github.com/ggml-org/llama.cpp/pull/14247 should fix any images being allowed

Sign up or log in to comment