Ame Vi's picture
6 13

Ame Vi

Ameeeee

AI & ML interests

None yet

Recent Activity

reacted to tomaarsen's post with ๐Ÿ”ฅ 1 day ago
An assembly of 18 European companies, labs, and universities have banded together to launch ๐Ÿ‡ช๐Ÿ‡บ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc. ๐Ÿ‡ช๐Ÿ‡บ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi 3๏ธโƒฃ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion โžก๏ธ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common. โš™๏ธ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported. ๐Ÿ”ฅ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models ๐Ÿ“Š Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight. ๐Ÿ“ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code. Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release * https://huggingface.co/EuroBERT/EuroBERT-210m * https://huggingface.co/EuroBERT/EuroBERT-610m * https://huggingface.co/EuroBERT/EuroBERT-2.1B The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
View all activity

Organizations

Hugging Face's profile picture Argilla's profile picture Women on Hugging Face's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture HuggingFaceFW-Dev's profile picture Data Is Better Together Contributor's profile picture Bluesky Community's profile picture

Ameeeee's activity

upvoted an article 15 days ago
view article
Article

Synthetic data: save money, time and carbon with open source

โ€ข 63
upvoted an article 29 days ago
view article
Article

Announcing AI Energy Score Ratings

By sasha โ€ข
โ€ข 26
upvoted an article about 2 months ago
view article
Article

Fine-tune ModernBERT for RAG with Synthetic Data

By sdiazlor and 2 others โ€ข
โ€ข 37
upvoted an article 4 months ago
view article
Article

Letโ€™s make a generation of amazing image generation models

By burtenshaw and 4 others โ€ข
โ€ข 33