Tom Aarsen's picture

Tom Aarsen

tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

Organizations

Hugging Face's profile picture Sentence Transformers's profile picture Sentence Transformers - Cross-Encoders's profile picture Hugging Face Internal Testing Organization's profile picture SetFit's profile picture Hugging Face Fellows's profile picture Massive Text Embedding Benchmark's profile picture Open-Source AI Meetup's profile picture Nomic AI's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Sentence Transformers Testing's profile picture mLLM multilingual's profile picture Social Post Explorers's profile picture Answer.AI's profile picture gg-tt's profile picture Distillation Hugs's profile picture Hugging Face Discord Community's profile picture Bert ... but new's profile picture

tomaarsen's activity

view reply

Normalizing once after truncation at the very end should be sufficient. You don't have to normalize beforehand, although it doesn't hurt.

view reply

Good question! Embeddings should be (re-)normalized after the Matryoshka truncation. If you only normalize before truncating, the truncated section won't exactly have the expected mean and standard error, but one that's very slightly off.