Quant'd incorrectly

by shambler74 - opened Jul 29

Jul 29

Getting an error when attempting to load this FP8 one in VLLM:

RuntimeError: size_n = 2736 is not divisible by tile_n_size = 64

Marlin kernels used by vLLM for FP8 require tensor dimensions (especially the output dimension, size_n) to be divisible by certain multiples, typically 64.

foyoux

Aug 30

RuntimeError: size_n = 5472 is not divisible by tile_n_size = 64

xhaozeng

Sep 19

same error

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment