ibm/re2g-reranker-nq · HOW TO USE WITH TEI

nbroad

Feb 29, 2024

No description provided.

add files for TEI71a64f9c

nbroad changed pull request status to open Mar 1, 2024

add hidden size1754e247

nbroad changed pull request title from TEI to HOW TO USE WITH TEI about 10 hours ago

nbroad

about 9 hours ago

•

edited about 8 hours ago

Set up the container:

model=ibm/re2g-reranker-nq
volume=$PWD/data
# specify this PR revision
revision=refs/pr/3

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model --revision $revision

Call the endpoint

Because this model has 2 classes, it can't be considered a re-ranker. Re-rankers only have 1. Thus, the predict route must be used to treat the model as a classifier.

curl 127.0.0.1:8080/predict \
     -X POST \
     -d '{"inputs": ["What is deep learning?", "Deep Learning\n\nDL is about machine learning and ai"]}' \
    -H 'Content-Type: application/json'

Text needs to be passed as a pair: ["Query", "Title\n\nPassage"] as mentioned here

~~## Note about formatting [WIP]~~

~~According to the code, I believe it is using the facebook/rag-token-nq tokenizer: see here~~

1. Example script (calls string_retreive): https://github.com/IBM/kgi-slot-filling/blob/re2g/dpr/dpr_apply.py#L60
2. string_retrieve calls prepare_seq2seq_batch: https://github.com/IBM/kgi-slot-filling/blob/re2g/corpus/corpus_client.py#L108
3. prepare_seq2seq_batch calls tokenizer.question_encoder: https://github.com/IBM/kgi-slot-filling/blob/re2g/generation/rag_util.py#L268C5-L268C26
4. tokenizer.question_encoder is just the tokenizer that is passed to RagTokenizer.from_pretrained: https://github.com/huggingface/transformers/blob/5fa35344755d8d9c29610b57d175efd03776ae9e/src/transformers/models/rag/tokenization_rag.py#L54
5. The actual tokenizer to RagTokenizer is DPRTokenizer (see code here, and files in model)
6. Here is the code for calling the DPRTokenizer

tokenizer(
        src_texts,
        add_special_tokens=True,
        max_length=512,
        padding="longest",
        truncation=True,
    )

6. I am assuming that the format is [CLS] query [SEP] passage [SEP] because of this code and because that is what happens when you pass tokenizer.decode(tokenizer.encode("query", "passage"))
7. Since TEI will add the BOS and EOS tokens ([CLS] and [SEP]), only the middle SEP token needs to be added.