Update 1_Pooling/config.json
Context: Embedding Models with causal mask only have the full attention context on the last token. Other pooling methods do not make sense!
Thank you for your contributions and suggestions.
As our model is trained using mean pooling, we also strive to utilize mean pooling during inference as much as possible.
Despite employing an autoregressive model, mean pooling can effectively capture sentence semantics. Through experiments, we have found that mean pooling slightly outperforms last token pooling, a point also discussed in the technical report.
If necessary, we will consider releasing a model trained using last token pooling at a later date. Alternatively, you are welcome to independently train using the training code we provide, which is also a viable option.
In conclusion, I will close this merge request. Thank you once again for your advice and support.
From theoretical perspective, this will reflect the input at position 0 n times & last token 1 times. But remaining in mean pooling retains the capabilities of the original retraining better. But good makes sense! I mainly opened this as PR and not discussion so I can clone from refs/pr/1
for further exploration.
Thanks for the prompt response, and congrats to the release. Beyond, happy to see you are compatible with upstream QWen2Model and did not ship remote code!