Post
104
Ran MTEB evaluation on Bulgarian tasks comparing EmbeddingGemma-300M (
google/embeddinggemma-300m) vs Multilingual-E5-Large (
intfloat/multilingual-e5-large)
EmbeddingGemma-300M scored 71.6% average while E5-Large got 75.9%. Pretty solid results for EmbeddingGemma considering it's half the size and uses way less resources.
EmbeddingGemma actually beats E5-Large on sentiment analysis and natural language inference. E5-Large wins on retrieval and bitext mining tasks.
The 300M model has 4x longer context window (2048 vs 512 tokens) and lower carbon footprint which is good.
Both models work great for Bulgarian but have different strengths depending what you need.
Blog article about the usage: https://huggingface.co/blog/embeddinggemma
PS: Don't forget to use the recommended libraries versions :D
EmbeddingGemma-300M scored 71.6% average while E5-Large got 75.9%. Pretty solid results for EmbeddingGemma considering it's half the size and uses way less resources.
EmbeddingGemma actually beats E5-Large on sentiment analysis and natural language inference. E5-Large wins on retrieval and bitext mining tasks.
The 300M model has 4x longer context window (2048 vs 512 tokens) and lower carbon footprint which is good.
Both models work great for Bulgarian but have different strengths depending what you need.
Blog article about the usage: https://huggingface.co/blog/embeddinggemma
PS: Don't forget to use the recommended libraries versions :D
pip install git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview
pip install sentence-transformers>=5.0.0