DeepSeek-R1-AWQ quantized model missing one layer of experts
Hi everyone,
I downloaded the quantized DeepSeek-R1-AWQ model from Hugging Face, and upon inspecting the model.safetensors.index.json, I noticed something curious: the expert layers start at layer 3, matching the original DeepSeek-R1, but they stop at layer 60. The original model clearly has experts all the way up to layer 61.
Does anyone know why the last expert layer (layer 61) is missing in this quantized version? Is this intentional? Which specific layer was removed? Is there any quantized model that can be executed with SGLang/vLLM on 8x80GB H100 GPUs while retaining all the expert layers?
Thanks in advance for any insights!
The last layer is for speculative decoding (MTP), since AutoAWQ doesn't support quantizing MTP head yet, it was excluded when quantizing. It should work as intended, just a little bit slower.
thanks @v2ray !
It looks like layer #61 in DeepSeek-R1 contains both a standard MoE expert block (identical to layers #3-60) and an additional MTP-specific block. The MTP block seems to be the final part of the layer, including:
"model.layers.N.embed_tokens.weight",
"model.layers.N.enorm.weight",
"model.layers.N.hnorm.weight",
"model.layers.N.eh_proj.weight",
"model.layers.N.shared_head.norm.weight",
"model.layers.N.shared_head.head.weight",
Since AutoAWQ does not support quantizing MTP, it removed layer #61 entirely. However, this also means that the MoE expert block in layer #61 was removed along with the MTP block.
I'm using this file as a reference, where you can find that there are also experts on layer 61 appart from MTP: https://huggingface.co/deepseek-ai/DeepSeek-R1/raw/main/model.safetensors.index.json
Just to confirm: were the MoE experts in layer #61 actually tied to MTP, or should they have been treated as independent from it (like the experts in layers #3-60)? If they were independent, could this mean that AutoAWQ unintentionally removed standard MoE functionality along with MTP?
The entire final layer is dedicated for MTP, so removing it is OK, if you check their official weight converting code, they actually discard the final layer too.
thanks a lot for clarifying!