Quant'd incorrectly
#1
by
shambler74
- opened
Getting an error when attempting to load this FP8 one in VLLM:
RuntimeError: size_n = 2736 is not divisible by tile_n_size = 64
Marlin kernels used by vLLM for FP8 require tensor dimensions (especially the output dimension, size_n) to be divisible by certain multiples, typically 64.