How about int8 quantization?
#3 opened about 2 months ago
by
traphix
INT 8
#2 opened about 2 months ago
by
freegheist

Slow inference on vLLM
3
#1 opened 2 months ago
by
hp1337