[CLS] token representation or Pooled tokens?
#8
by
aarabil
- opened
How is the base model used during finetuning, do you use the [CLS] hidden token representation or do you pool the tokens together somehow (e.g. averaging)?
Coming back to this question, especially since the base model is a finetuned embedding model in this case. Wondering how you adapt this model compared to a standard encoder for the nli task? Do you simply use the embedding model in the same way as the other models? If so, which separation token so you use?
I used the exact same script as for other encoder-only models like RoBERTa or DeBERTa via the HF trainer, so I assume that the trainer used the CLS token