No mmproj file?

by mingyi456 - opened May 14

May 14

•

This should be a vision model, but it seems the mmproj file is missing.

Edit: I see that the static quant version that you uploaded has the mmproj file, so I guess you forgot to include it here?

nicoboss

May 14

We always only include the mmproj files in the static quant repository as you can only staticaly but not imatrix quant the vison layers. But I agree that currently they are incredibly hard to find. Not only are they only in the static repository but the model card doesn't even list them. I hope @mradermacher improves this in the next generation of model cards. In the meantime I recommend you use the download page avilable under https://hf.tst.eu/model which only only shows you all the files in an intuitive way but also automaticaly concatinates the splits when downloading a GGUF larger than 50 GB.

mingyi456

May 14

•

edited May 14

Just a silly question, could you theoretically imatrix quant the vision layers using imatrix data that consists of images instead of text? I guess llama.cpp would need an update to allow using imatrix for 8 bit quants specifically for mmproj files, as well as allowing more types of data for imatrix computation in general, but would that help in quantization quality for the mmproj file?

nicoboss

May 14

Theoretically one could but realisticialy if you quant image layers to less then Q8 the quality degrades relativiely quickly and imatrix only really statrts to make sqense at Q5 and and only really makes a bit difference at Q4 and below which is too littel for imatrix quants. In any case it is not something supported by llama.cpp. For now I would just be happy to have more static vision quants avilable because as you can see from this model you can only choose between Q8 and F16.

mingyi456

May 14

My idea is that since image layers seem more sensitive to quantization, maybe they can benefit from imatrix even at Q8, unlike text layers which only benefit at Q5 or below. Or does it not work that way? I guess the only static quant alternative is for the new fancy DF11 lossless quant (which is not supported by llama.cpp as well) or if someone decides to make a new lossy Q12 quant format. Or maybe vision layers might benefit from the fancy K or IQ quants even at 8 bits, if someone decides to try creating such a quant preset.

mingyi456 changed discussion status to closed May 14

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment