Remove quantization_config from config.json

Removing this field to match R1 config.json. Having it causes vllm and sglang to throw and error because they think we're trying to run fp8 quantized model in fp4 precision.
How to reproduce?
huggingface-cli download nvidia/DeepSeek-V3-0324-FP4 --local-dir v3
python3 -m sglang.launch_server --model-path ./v3 --trust-remote-code --quantization modelopt_fp4 --tp 8 --enable-flashinfer-moe

ValueError: Quantization method specified in the model config (fp8) does not match the quantization method defined in the 'quantization' argument (modelopt_fp4)

Files changed (1) hide show

config.json +0 -9

config.json CHANGED Viewed

@@ -36,15 +36,6 @@
   "q_lora_rank": 1536,
   "qk_nope_head_dim": 128,
   "qk_rope_head_dim": 64,
-  "quantization_config": {
-    "activation_scheme": "dynamic",
-    "fmt": "e4m3",
-    "quant_method": "fp8",
-    "weight_block_size": [
-      128,
-      128
-    ]
-  },
   "rms_norm_eps": 1e-06,
   "rope_scaling": {
     "beta_fast": 32,

   "q_lora_rank": 1536,
   "qk_nope_head_dim": 128,
   "qk_rope_head_dim": 64,
   "rms_norm_eps": 1e-06,
   "rope_scaling": {
     "beta_fast": 32,