Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
36.4
TFLOPS
1994
377
1105
Tom Aarsen
tomaarsen
Follow
pittliang's profile picture
azat-serikbayev's profile picture
JoPmt's profile picture
3003 followers
ยท
312 following
https://linkedin.com/in/tomaarsen
tomaarsen
tomaarsen
tomaarsen
tomaarsen.com
AI & ML interests
NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification
Recent Activity
liked
a Space
about 24 hours ago
Foaster/Werewolf_benchmark
liked
a model
1 day ago
asmud/indonesian-embedding-small
replied
to
s-emanuilov
's
post
1 day ago
Ran MTEB evaluation on Bulgarian tasks comparing EmbeddingGemma-300M (https://huggingface.co/google/embeddinggemma-300m)) vs Multilingual-E5-Large (https://huggingface.co/intfloat/multilingual-e5-large) EmbeddingGemma-300M scored 71.6% average while E5-Large got 75.9%. Pretty solid results for EmbeddingGemma considering it's half the size and uses way less resources. EmbeddingGemma actually beats E5-Large on sentiment analysis and natural language inference. E5-Large wins on retrieval and bitext mining tasks. The 300M model has 4x longer context window (2048 vs 512 tokens) and lower carbon footprint which is good. Both models work great for Bulgarian but have different strengths depending what you need. Blog article about the usage: https://huggingface.co/blog/embeddinggemma PS: Don't forget to use the recommended libraries versions :D ``` pip install git+https://github.com/huggingface/
[email protected]
pip install sentence-transformers>=5.0.0 ```
View all activity
Organizations
tomaarsen
's Spaces
1
Sort:ย Recently updated
Running
172
GLiNER-medium-v2.1, zero-shot NER
๐ป
Identify key entities in text