@s-emanuilov on Hugging Face: "Ran MTEB evaluation on Bulgarian tasks comparing EmbeddingGemma-300M…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

s-emanuilov

posted an update Sep 5

Post

226

Ran MTEB evaluation on Bulgarian tasks comparing EmbeddingGemma-300M ( google/embeddinggemma-300m) vs Multilingual-E5-Large ( intfloat/multilingual-e5-large)

EmbeddingGemma-300M scored 71.6% average while E5-Large got 75.9%. Pretty solid results for EmbeddingGemma considering it's half the size and uses way less resources.

EmbeddingGemma actually beats E5-Large on sentiment analysis and natural language inference. E5-Large wins on retrieval and bitext mining tasks.

The 300M model has 4x longer context window (2048 vs 512 tokens) and lower carbon footprint which is good.

Both models work great for Bulgarian but have different strengths depending what you need.

Blog article about the usage: https://huggingface.co/blog/embeddinggemma

PS: Don't forget to use the recommended libraries versions :D

pip install git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview
pip install sentence-transformers>=5.0.0

tomaarsen

Sep 5

Quick question, were you on the recommended transformers version (https://huggingface.co/blog/embeddinggemma#sentence-transformers), and did you use the prompts (i.e. either call the model with model.encode_query/model.encode_document/model.encode(..., prompt_name="query") in Sentence Transformers?

There's a few ways for it to quietly go wrong!

s-emanuilov

Sep 5

Hey Tom,

First, appreciate your work! Thanks for everything you're doing.

I did use the prompt dict for "intfloat/multilingual-e5-large", like: prompts = {"query": "query: ", "passage": "passage: "} to SentenceTransformer.

For "google/embeddinggemma-300m", I kept the default: model = SentenceTransformer("google/embeddinggemma-300m") and then evaluated with MTEB library, assuming that "MTEB will automatically detect and use these prompts if they are defined in your model's configuration," as written here https://sbert.net/docs/sentence_transformer/usage/mteb_evaluation.html

So in short, I did not add prompts for EmbeddingGemma, but added them to multilingual-e5-large, as per their instructions (didn't have time to check their model config, but I think it's not added by default).

BUT, I ran with transformers==4.55.4, so need to re-run maybe...
sentence-transformers==5.1.0, which is fine I guess.

Thanks!

In this post