alphatozeta commited on
Commit
8c765e2
·
verified ·
1 Parent(s): 28b46b4

Remove quantization_config from config.json

Browse files

Removing this field to match R1 config.json. Having it causes vllm and sglang to throw and error because they think we're trying to run fp8 quantized model in fp4 precision.
How to reproduce?
huggingface-cli download nvidia/DeepSeek-V3-0324-FP4 --local-dir v3
python3 -m sglang.launch_server --model-path ./v3 --trust-remote-code --quantization modelopt_fp4 --tp 8 --enable-flashinfer-moe

ValueError: Quantization method specified in the model config (fp8) does not match the quantization method defined in the 'quantization' argument (modelopt_fp4)

Files changed (1) hide show
  1. config.json +0 -9
config.json CHANGED
@@ -36,15 +36,6 @@
36
  "q_lora_rank": 1536,
37
  "qk_nope_head_dim": 128,
38
  "qk_rope_head_dim": 64,
39
- "quantization_config": {
40
- "activation_scheme": "dynamic",
41
- "fmt": "e4m3",
42
- "quant_method": "fp8",
43
- "weight_block_size": [
44
- 128,
45
- 128
46
- ]
47
- },
48
  "rms_norm_eps": 1e-06,
49
  "rope_scaling": {
50
  "beta_fast": 32,
 
36
  "q_lora_rank": 1536,
37
  "qk_nope_head_dim": 128,
38
  "qk_rope_head_dim": 64,
 
 
 
 
 
 
 
 
 
39
  "rms_norm_eps": 1e-06,
40
  "rope_scaling": {
41
  "beta_fast": 32,