Add padding token to config (fix batched generation)
Fixes Cannot handle batch sizes > 1 if no padding token is defined
tested with:
infinity_emb v2 --model-id michaelfeil/mxbai-rerank-large-v2-seq --device mps --engine torch --no-model-warmup --revision refs/pr/1
Can confirm, this does the trick. Thanks for the fix-up!
Makes sense. Have you checked if it affects accuracy if you run e.g. a batch-size 1 vs batch-size 2 scenario?
Hmm something does seem weird with accuracy but I'm not sure if it's the batching yet. Getting worse results than bge-m3-v2
Are you actually applying the chat template for sending classification requests. How about batch-size 1?
Oops, I accidentally merged this.
I will add: the original model uses a logit output scoring range of -10 to +10 as well. Something is occurring with the normalization of scoring with this model as a classifier, as most things score 0.99 or higher.
This could be an issue with the padding, the Qwen2Tokenizer, or infinity itself. Just figured I’d report, as using it via standard transformers yields accurate results.
Inverse sigmoid wouldn't change any relative rankings right?
Are you actually applying the chat template for sending classification requests. How about batch-size 1?
Yep based on your script linked on the GH issue. Will see if I can get to testing with batch size 1. Results are definitely not great with larger batch sizes (worse than bge-m3-v2 on the nanobeir subsets I tested)
<|endoftext|><|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
<|im_end|>
<|im_start|>user
query: %s
document: %s
You are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).
Relevance:<|im_end|>
<|im_start|>assistant
print(results)
[RankResult(index=0, score=12.063760757446289, document="'To Kill a Mockingbird' is a novel by Harper Lee published in 1960."), RankResult(index=2, score=9.638206481933594, document='Harper Lee was born in 1926 in Monroeville, Alabama.'), RankResult(index=1, score=-1.3470792770385742, document="The novel 'Moby-Dick' was written by Herman Melville.")]
Not matching inverse sigmoid values:
curl -X 'POST' \
'https://<endpoint>/classify' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input": [
"<|endoftext|><|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n<|im_end|>\n<|im_start|>user\nquery: Who wrote '\''To Kill a Mockingbird'\''? \ndocument: '\''To Kill a Mockingbird'\'' is a novel by Harper Lee published in 1960. \nYou are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:<|im_end|>\n<|im_start|>assistant\n",
"<|endoftext|><|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n<|im_end|>\n<|im_start|>user\nquery: Who wrote '\''To Kill a Mockingbird'\''? \ndocument: The novel '\''Moby-Dick'\'' was written by Herman Melville. \nYou are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:<|im_end|>\n<|im_start|>assistant\n",
"<|endoftext|><|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n<|im_end|>\n<|im_start|>user\nquery: Who wrote '\''To Kill a Mockingbird'\''? \ndocument: Harper Lee was born in 1926 in Monroeville, Alabama. \nYou are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:<|im_end|>\n<|im_start|>assistant\n"
],
"raw_scores": false
}'
{"object":"classify","data":[[{"score":0.9999667406082153,"label":"1"},{"score":0.00003321419353596866,"label":"0"}],[{"score":0.894789457321167,"label":"1"},{"score":0.10521053522825241,"label":"0"}],[{"score":0.9996947050094604,"label":"1"},{"score":0.00030534894904121757,"label":"0"}]],"model":"michaelfeil/mxbai-rerank-large-v2-seq","usage":{"prompt_tokens":1627,"total_tokens":1627},"id":"infinity-ddd4593f-5afa-45e3-a094-1ca25713605b","created":1743104764}
ln(0.9999667406082153/(1-0.9999667406082153)) = 10.3111401111
ln(0.894789457321167/(1-0.894789457321167)) = 2.14062493652
ln(0.9996947050094604/(1-0.9996947050094604)) = 8.09392672508
I am confused, what are you trying there? Why are you using /rerank? Rerank is not supported in infinity with this model. Do not use /rerank.
Wdym, I'm using classify, refer to the code snippet? My point is infinity results do not match the mixedbread implementation.
Are you loading the model in bf16?