Remove quantization_config from config.json
#1
by
alphatozeta
- opened
Removing this field to match R1 config.json. Having it causes vllm and sglang to throw and error because they think we're trying to run fp8 quantized model in fp4 precision.
How to reproduce?
huggingface-cli download nvidia/DeepSeek-V3-0324-FP4 --local-dir v3
python3 -m sglang.launch_server --model-path ./v3 --trust-remote-code --quantization modelopt_fp4 --tp 8 --enable-flashinfer-moe
ValueError: Quantization method specified in the model config (fp8) does not match the quantization method defined in the 'quantization' argument (modelopt_fp4)