Qwen
/

Text Generation
Transformers
Safetensors
Chinese
English
qwen
custom_code

Does Qwen use RMSNorm or LayerNorm?

#21
by hafezmg48 - opened

I noticed in the config file the epsilon parameter is named "layer_norm_epsilon" instead of "rms_norm_epsilon" so I thought that Qwen uses LayerNorm. But when checking the "modeling_qwen.py" code I found that the RMSNorm function is being called. So just to be clear wanted to ask, Qwen is using RMSNorm right?

Qwen org

The Qwen series use RMSNorm. Please consider using Qwen2.

Sign up or log in to comment