[CLS] token representation or Pooled tokens?

#8
by aarabil - opened

How is the base model used during finetuning, do you use the [CLS] hidden token representation or do you pool the tokens together somehow (e.g. averaging)?

Coming back to this question, especially since the base model is a finetuned embedding model in this case. Wondering how you adapt this model compared to a standard encoder for the nli task? Do you simply use the embedding model in the same way as the other models? If so, which separation token so you use?

I used the exact same script as for other encoder-only models like RoBERTa or DeBERTa via the HF trainer, so I assume that the trainer used the CLS token

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment