Add padding token to config (fix batched generation)

by rawsh - opened 5 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-0

rawsh

5 days ago

Fixes Cannot handle batch sizes > 1 if no padding token is defined

tested with:

infinity_emb v2 --model-id michaelfeil/mxbai-rerank-large-v2-seq --device mps --engine torch --no-model-warmup --revision refs/pr/1

Add padding token to config (fix batched generation)d3e76ede

prolix-oc

5 days ago

Can confirm, this does the trick. Thanks for the fix-up!

michaelfeil

Owner 5 days ago

https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct/blob/389cdda80172f0f04f04299265f22762f44039b7/tokenization_qwen.py#L227

Makes sense. Have you checked if it affects accuracy if you run e.g. a batch-size 1 vs batch-size 2 scenario?

rawsh

4 days ago

Hmm something does seem weird with accuracy but I'm not sure if it's the batching yet. Getting worse results than bge-m3-v2

michaelfeil

Owner 4 days ago

Are you actually applying the chat template for sending classification requests. How about batch-size 1?

michaelfeil changed pull request status to merged 4 days ago

michaelfeil

Owner 4 days ago

Oops, I accidentally merged this.

prolix-oc

4 days ago

I will add: the original model uses a logit output scoring range of -10 to +10 as well. Something is occurring with the normalization of scoring with this model as a classifier, as most things score 0.99 or higher.

This could be an issue with the padding, the Qwen2Tokenizer, or infinity itself. Just figured I’d report, as using it via standard transformers yields accurate results.

michaelfeil

Owner 4 days ago

Can you just inverse sigmoid these numbers aka:

rawsh

4 days ago

•

edited 4 days ago

Inverse sigmoid wouldn't change any relative rankings right?

Are you actually applying the chat template for sending classification requests. How about batch-size 1?

Yep based on your script linked on the GH issue. Will see if I can get to testing with batch size 1. Results are definitely not great with larger batch sizes (worse than bge-m3-v2 on the nanobeir subsets I tested)

<|endoftext|><|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
<|im_end|>
<|im_start|>user
query: %s 
document: %s 
You are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).
Relevance:<|im_end|>
<|im_start|>assistant

rawsh

3 days ago

print(results)
[RankResult(index=0, score=12.063760757446289, document="'To Kill a Mockingbird' is a novel by Harper Lee published in 1960."), RankResult(index=2, score=9.638206481933594, document='Harper Lee was born in 1926 in Monroeville, Alabama.'), RankResult(index=1, score=-1.3470792770385742, document="The novel 'Moby-Dick' was written by Herman Melville.")]

Not matching inverse sigmoid values:

 curl -X 'POST' \
'https://<endpoint>/classify' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input": [
  "<|endoftext|><|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n<|im_end|>\n<|im_start|>user\nquery: Who wrote '\''To Kill a Mockingbird'\''? \ndocument: '\''To Kill a Mockingbird'\'' is a novel by Harper Lee published in 1960. \nYou are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:<|im_end|>\n<|im_start|>assistant\n",

  "<|endoftext|><|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n<|im_end|>\n<|im_start|>user\nquery: Who wrote '\''To Kill a Mockingbird'\''? \ndocument: The novel '\''Moby-Dick'\'' was written by Herman Melville. \nYou are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:<|im_end|>\n<|im_start|>assistant\n",

  "<|endoftext|><|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n<|im_end|>\n<|im_start|>user\nquery: Who wrote '\''To Kill a Mockingbird'\''? \ndocument: Harper Lee was born in 1926 in Monroeville, Alabama. \nYou are a search relevance expert who evaluates how well documents match search queries. For each query-document pair, carefully analyze the semantic relationship between them, then provide your binary relevance judgment (0 for not relevant, 1 for relevant).\nRelevance:<|im_end|>\n<|im_start|>assistant\n"
],
"raw_scores": false
}'
{"object":"classify","data":[[{"score":0.9999667406082153,"label":"1"},{"score":0.00003321419353596866,"label":"0"}],[{"score":0.894789457321167,"label":"1"},{"score":0.10521053522825241,"label":"0"}],[{"score":0.9996947050094604,"label":"1"},{"score":0.00030534894904121757,"label":"0"}]],"model":"michaelfeil/mxbai-rerank-large-v2-seq","usage":{"prompt_tokens":1627,"total_tokens":1627},"id":"infinity-ddd4593f-5afa-45e3-a094-1ca25713605b","created":1743104764}

ln(0.9999667406082153/(1-0.9999667406082153)) = 10.3111401111

ln(0.894789457321167/(1-0.894789457321167)) = 2.14062493652

ln(0.9996947050094604/(1-0.9996947050094604)) = 8.09392672508

michaelfeil

Owner 3 days ago

I am confused, what are you trying there? Why are you using /rerank? Rerank is not supported in infinity with this model. Do not use /rerank.

rawsh

2 days ago

•

edited 2 days ago

Wdym, I'm using classify, refer to the code snippet? My point is infinity results do not match the mixedbread implementation.

michaelfeil