Remove quantization_config from config.json

#1

Removing this field to match R1 config.json. Having it causes vllm and sglang to throw and error because they think we're trying to run fp8 quantized model in fp4 precision.
How to reproduce?
huggingface-cli download nvidia/DeepSeek-V3-0324-FP4 --local-dir v3
python3 -m sglang.launch_server --model-path ./v3 --trust-remote-code --quantization modelopt_fp4 --tp 8 --enable-flashinfer-moe

ValueError: Quantization method specified in the model config (fp8) does not match the quantization method defined in the 'quantization' argument (modelopt_fp4)

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment