Update inference/bf16_cast_channel_int8.py
#10 opened about 10 hours ago
by
HandH1998
Update config.json
#9 opened about 10 hours ago
by
HandH1998
how to achieve 2500 tps throughput?
#8 opened 1 day ago
by
muziyongshixin
can this model run with `ollama` with `pure cpu` model?
#7 opened 4 days ago
by
ice6
Add `quantization_config` in config.json?
4
#4 opened 7 days ago
by
WeiwenXia
运行channel INT8后sglang报错OOM
1
#3 opened 9 days ago
by
zhangneilc