Does Qwen use RMSNorm or LayerNorm?
#21
by
hafezmg48
- opened
I noticed in the config file the epsilon parameter is named "layer_norm_epsilon" instead of "rms_norm_epsilon" so I thought that Qwen uses LayerNorm. But when checking the "modeling_qwen.py" code I found that the RMSNorm function is being called. So just to be clear wanted to ask, Qwen is using RMSNorm right?
The Qwen series use RMSNorm. Please consider using Qwen2.