No Name's picture

No Name

Ainonake

AI & ML interests

None yet

Recent Activity

Organizations

None yet

Ainonake's activity

New activity in TheDrummer/Llama-3SOME-8B-v2 4 days ago

fp8

2
#6 opened 4 days ago by
010O11
New activity in Undi95/MLewd-ReMM-L2-Chat-20B 7 days ago
New activity in ByteDance-Seed/UI-TARS-1.5-7B 7 days ago

Ollama deployment

2
1
#7 opened 14 days ago by
sedatkaradag
New activity in unsloth/Qwen3-235B-A22B-GGUF 13 days ago
New activity in nyuuzyou/archiveofourown 15 days ago
New activity in TheDrummer/Anubis-70B-v1 18 days ago

Gave it a whirl

1
#4 opened 19 days ago by
SkyStach
New activity in nari-labs/Dia-1.6B 19 days ago
New activity in TheDrummer/Fallen-Command-A-111B-v1.1 about 1 month ago
reacted to tomaarsen's post with ❀️ 2 months ago
view post
Post
6697
An assembly of 18 European companies, labs, and universities have banded together to launch πŸ‡ͺπŸ‡Ί EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

πŸ‡ͺπŸ‡Ί 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3️⃣ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
➑️ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
βš™οΈ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
πŸ”₯ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
πŸ“Š Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
πŸ“ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
  • 1 reply
Β·
New activity in Undi95/MistralThinker-v1.1 2 months ago

This shit is fire

13
#2 opened 2 months ago by
Ainonake
replied to Undi95's post 2 months ago
view reply

Then what if we do the same, but put whole conversation in first user input?

So it will be
System prompt
User: conversation history
Then ask R1 to generate thinking.

And the amount of messages in conversation history should be varied. Then, Bot reply will always contain thinking.

replied to Undi95's post 2 months ago
view reply

What do you think about doing part of the dataset with replies from some context?

E.g. we have e.g. 50% of data with thinking from first user answer, and some parts of dataset with

User,
Bot (no thinking),
User
Bot (no thinking),
User, N times,
Then ask R1 to think here and train on it. So the model will understand long context better.