Does it possible to create a version without MTP layer to save some VRAM

#1
by adonishong - opened

Appreciate for your work, does it possible to create a version without MTP layer to save some VRAM as described in title?

I think vLLM breaks with quantized MTP layer currently so it would break compatibility?

Sign up or log in to comment