Quant'd incorrectly

#1
by shambler74 - opened

Getting an error when attempting to load this FP8 one in VLLM:

RuntimeError: size_n = 2736 is not divisible by tile_n_size = 64

Marlin kernels used by vLLM for FP8 require tensor dimensions (especially the output dimension, size_n) to be divisible by certain multiples, typically 64.

Sign up or log in to comment