Make compatible to sentence-transformers

#10
No description provided.
michaelfeil changed pull request title from Create 1_Pooling/config.json to Make compatible to sentence-transformers

@Shitao Can you merge this? This will make it compatible to last-token pooling in sentence transformers

@michaelfeil thank you for making these change, i believe it will bring in benefits for the model serving.

one question regarding the max_seq_length, though not officially mentioned, the bge-en-icl model context window seems to be 32,768 according to MTEB leaderboard. for any reason you set max_seq_length to 4,096 in this change?

@starsy Fair point. 4096 is only be the default max-length for sentence-transformer loading.

32768 will lead to an OOM for some users, yet 32768 is technically correct.

Beyond, for huggingface-transformer implementation, there is a sliding window of 4096. Beyond 4096, you need the flash-attn cuda extension installed to receive correct output, otherwise you will just have silently incorrect output as torch.sdpa does not support window_size=4096 causal fwd attention.

Leaving in 32768 for now! @Shitao appreciate your review.

@Shitao Can you please review?

Beijing Academy of Artificial Intelligence org

Thanks for your contribution! @michaelfeil

Shitao changed pull request status to merged

Sign up or log in to comment