Does it possible to create a version without MTP layer to save some VRAM
#1
by
adonishong
- opened
Appreciate for your work, does it possible to create a version without MTP layer to save some VRAM as described in title?
I think vLLM breaks with quantized MTP layer currently so it would break compatibility?