48 53 162

Stefano Fiorucci PRO

anakin87

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

posted an update 4 days ago

Your Language Model needs better (open) environments to learn 🌀 📝 https://huggingface.co/blog/anakin87/environments-hub RL environments help LLMs practice, reason, and improve. I explored the Environments Hub and wrote a walkthrough showing how to train and evaluate models using these open environments. 1️⃣ 𝗪𝗵𝘆 𝗥𝗟 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 DeepSeek-R1 made clear that Reinforcement Learning can be used to incentivize reasoning in LLMs. In GRPO, the model generates multiple answers and learns to prefer the better ones from rewards. 2️⃣ 𝗪𝗵𝗮𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀 𝗮𝗿𝗲 In classic RL, the environment is the world where the Agent lives, interacts, and get rewards to learn. We can also think of them as software packages, containing data, harness and scoring rules - for the model to learn and be evaluated. Nowadays, the Agent is not just the LLM. It can use tools, from a weather API to a terminal. This makes environments for training and evaluation more complex and critical. 3️⃣ 𝐓𝐡𝐞 𝐨𝐩𝐞𝐧 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 Big labs are advancing, but open models and the community still face a fragmented ecosystem. We risk becoming users of systems built with tools we can't access or fully understand. 4️⃣ 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭𝐬 𝐇𝐮𝐛 That's why, I was excited when Prime Intellect released the Environments Hub. It's a place where people share RL environments: tasks you can use to train LLMs with RL (GRPO-style) or evaluate Agents. Plus, the Verifiers library (@willcb) standardizes the creation of RL environments and evaluations. They can help to keep science and experimentation open. 🔬 I explored the Hub and wrote a hands-on walkthrough 📝 - RL + LLMs basics - Environments Hub navigation - Evaluating models/Agents - GRPO Training a tiny model on an alphabetical sort task Take a look! 📝 https://huggingface.co/blog/anakin87/environments-hub

new activity 4 days ago

community-spotlight/README:Nominate a model creator

new activity 4 days ago

community-spotlight/README:Nominate an educator

View all activity

Organizations

posted an update 4 days ago

Post

271

Your Language Model needs better (open) environments to learn 🌀

📝 https://huggingface.co/blog/anakin87/environments-hub

RL environments help LLMs practice, reason, and improve.
I explored the Environments Hub and wrote a walkthrough showing how to train and evaluate models using these open environments.

1️⃣ 𝗪𝗵𝘆 𝗥𝗟 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀

DeepSeek-R1 made clear that Reinforcement Learning can be used to incentivize reasoning in LLMs.
In GRPO, the model generates multiple answers and learns to prefer the better ones from rewards.

2️⃣ 𝗪𝗵𝗮𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀 𝗮𝗿𝗲
In classic RL, the environment is the world where the Agent lives, interacts, and get rewards to learn.

We can also think of them as software packages, containing data, harness and scoring rules - for the model
to learn and be evaluated.

Nowadays, the Agent is not just the LLM. It can use tools, from a weather API to a terminal.

This makes environments for training and evaluation more complex and critical.

3️⃣ 𝐓𝐡𝐞 𝐨𝐩𝐞𝐧 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞

Big labs are advancing, but open models and the community still face a fragmented ecosystem.
We risk becoming users of systems built with tools we can't access or fully understand.

4️⃣ 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭𝐬 𝐇𝐮𝐛
That's why, I was excited when Prime Intellect released the Environments Hub.

It's a place where people share RL environments: tasks you can use to train LLMs with RL (GRPO-style) or evaluate Agents.
Plus, the Verifiers library ( @willcb ) standardizes the creation of RL environments and evaluations.
They can help to keep science and experimentation open. 🔬

I explored the Hub and wrote a hands-on walkthrough 📝
- RL + LLMs basics
- Environments Hub navigation
- Evaluating models/Agents
- GRPO Training a tiny model on an alphabetical sort task

Take a look!

📝 https://huggingface.co/blog/anakin87/environments-hub

New activity in community-spotlight/README 4 days ago

Nominate a model creator

#1 opened 8 days ago by

burtenshaw

Nominate an educator

#2 opened 8 days ago by

burtenshaw

reacted to sergiopaniego's post with 🔥 5 days ago

Post

3802

You can now supercharge your TRL training pipelines with kernels

👷 kernels is new library to load optimized compute kernels directly from the Hub

Combined with TRL, it makes you developer experience smoother & faster.

Check out the new guide to learn more! 🕺

Learn ➡️ https://huggingface.co/docs/trl/main/en/kernels_hub

upvoted 2 articles 5 days ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

and 5 others •

5 days ago

• 179

Article

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

•

5 days ago

• 22

upvoted a collection 5 days ago

EmbeddingGemma

Collection

3 items • Updated 5 days ago • 64

liked a dataset 5 days ago

uv-scripts/build-atlas

Updated 5 days ago • 183 • 4

published an article 5 days ago

Article

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

•

5 days ago

• 22

published a model 5 days ago

anakin87/Qwen3-0.6B-alphabet-sort-grpo

Text Generation • 0.6B • Updated 5 days ago • 45

updated a model 5 days ago

anakin87/Qwen3-0.6B-alphabet-sort-grpo

Text Generation • 0.6B • Updated 5 days ago • 45

updated 2 datasets 5 days ago

anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval

Viewer • Updated 5 days ago • 15 • 74

anakin87/Qwen3-0.6B-alphabet-sort-eval

Viewer • Updated 5 days ago • 15 • 85

commented on I trained a Language Model to schedule events with GRPO! 5 days ago

I haven't tried because I wanted to experiment with GRPO primarily.
But I agree, it would probably have helped improve performance.
Perhaps with SFT+GRPO you can use a smaller model (4B?) and still get decent results.

New activity in anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval 8 days ago