Llama 4 GGUFs now with multimodal (image) capabilities.
Thanks to a PR in llama.cpp!
The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?
The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?
Oh dear! Good catch. We're gonna reupload. Can't believe it keeps happening rip
The mmproj files seem to all be 1.79 kb each. Is this intended, or is the upload broken?
Should be fixed now! @mingyi456
Has anyone successfully used vision capabilities with these files?
clip_init: failed to load model 'mmproj-F32.gguf': operator(): unable to find tensor mm.model.fc.weight
ggml_metal_free: deallocating
mtmd_init_from_file: error: Failed to load CLIP model from mmproj-F32.gguf
srv load_model: failed to load multimodal model, 'mmproj-F32.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
ggml_metal_free: deallocating
This is with llama.cpp version 5586.
@finding1
Is this simply from trying to load the mmproj file? If so, it might be a problem with the metal backend? I tried version b5588 directly using llama-server
, using the cuda backend (with partial cpu offloading) and it successfully loads the mmproj file. However, it seems to be unable to process certain images and crashes instead, while other images work. I am unsure what is it about those images that causes the crashes though.
@finding1 I see the same thing using llama-b5604-bin-win-cuda-12.4-x64.
clip_init: failed to load model '~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf': operator(): unable to find tensor mm.model.fc.weight
mtmd_init_from_file: error: Failed to load CLIP model from ~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf
srv load_model: failed to load multimodal model, '~\llm\models\unsloth\Llama-4-Scout-17B-16E-Instruct-GGUF\mmproj-F16.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
Same thing happens for both mmproj-F16.gguf and mmproj-F32.gguf. I've been trying to switch from LM Studio to llama-swap + llama.cpp and, strangely, the GGUF files that run successfully with LM Studio show this error when run with llama.cpp. I tried downloading the model with the HF CLI and through LM Studio to make sure I was getting the same model, the files are identical. LM Studio uses ggml/llama.cpp but they don't use the command line, they compile and ship the dynamic libraries. So I'm unsure exactly how they are running this or if they have patches of their own applied.
Also Gemma 3 works for me with images under this setup, so it's not like it's completely broken.
There's currently an issue in llama.cpp where the vision component doesn't work anymore. We tried with ggml's quants as well and they dont work :(
@finding1
I fixed the mmproj - it was a llama.cpp bug.
@mingyi456
Yes I can repro - https://github.com/ggml-org/llama.cpp/pull/14247 should fix any images being allowed