Hugging Face Agents Course

Enterprise
university
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

agents-course's activity

thomwolfย 
posted an update about 22 hours ago
view post
Post
1849
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Faceโ€”in robotics and across all AI fieldsโ€”we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
ยท
m-ricย 
posted an update 15 days ago
view post
Post
2136
๐Ÿš€ DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. ๐Ÿ“š

๐Ÿ‘‰ But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
๐ŸŽฏ Action type accuracy: Does the predicted action match the ground truth?
๐Ÿ“ Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
๐Ÿ“‘ Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasksโ€”compared to 76,000 tasks for larger models like OS-Atlasโ€”UI-R1 shows significant efficiency and improved performance:
๐Ÿ“ˆ Boosted action prediction accuracy from 76% to 89% on AndroidControl.
๐ŸŒ Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
๐Ÿ” Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? ๐Ÿง

Read the full paper here ๐Ÿ‘‰ UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)
thomwolfย 
posted an update 16 days ago
view post
Post
3176
The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
  • 1 reply
ยท
burtenshawย 
posted an update 19 days ago
view post
Post
2900
NEW UNIT in the Hugging Face Reasoning course. We dive deep into the algorithm behind DeepSeek R1 with an advanced and hands-on guide to interpreting GRPO.

๐Ÿ”— reasoning-course

This unit is super useful if youโ€™re tuning models with reinforcement learning. It will help with:

- interpreting loss and reward progression during training runs
- selecting effective parameters for training
- reviewing and defining effective reward functions

This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth.

๐Ÿ“ฃ Shout out to @ShirinYamani who wrote the unit. Follow for more great content.
  • 1 reply
ยท
burtenshawย 
posted an update 25 days ago
view post
Post
3678
The Hugging Face Agents Course now includes three major agent frameworks!

๐Ÿ”— agents-course

This includes LlamaIndex, LangChain, and our very own smolagents. We've worked to integrate the three frameworks in distinctive ways so that learners can reflect on when and where to use each.

This also means that you can follow the course if you're already familiar with one of these frameworks, and soak up some of the fundamental knowledge in earlier units.

Hopefully, this makes the agents course as open to as many people as possible.
  • 3 replies
ยท
m-ricย 
posted an update about 1 month ago
view post
Post
4823
smolagents now support vLLM! ๐Ÿฅณ

As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.

Go try it and tell us what you think!

https://github.com/huggingface/smolagents/blob/45b2c86857b7f7657daaa74e4d17d347e9e2c4a4/src/smolagents/models.py#L497
burtenshawย 
posted an update about 1 month ago
view post
Post
2389
The open LLM leaderboard is completed, retired, dead, โ€˜ascended to a higher planeโ€™. And in its shadow we have an amazing range of leaderboards built and maintained by the community.

In this post, I just want to list some of those great leaderboards that you should bookmark for staying up to date:

- Chatbot Arena LLM Leaderboard is the first port of call for checking out the best model. Itโ€™s not the fastest because humans will need to use the models to get scores, but itโ€™s worth the wait. lmarena-ai/chatbot-arena-leaderboard

- OpenVLM Leaderboard is great for getting scores on vision language models opencompass/open_vlm_leaderboard

- Ai2 are doing a great job on RewardBench and I hope they keep it up because reward models are the unsexy workhorse of the field. allenai/reward-bench

- The GAIA leaderboard is great for evaluating agent applications. gaia-benchmark/leaderboard

๐Ÿคฉ This seems like such a sustainable way of building for the long term, where rather than leaning on a single company to evaluate all LLMs, we share the load.
  • 3 replies
ยท
burtenshawย 
posted an update about 1 month ago
view post
Post
2124
Still speed running Gemma 3 to think. Today I focused on setting up gpu poor hardware to run GRPO.

This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.

๐Ÿง‘โ€๐Ÿณ Thereโ€™s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.

Hereโ€™s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing
  • 1 reply
ยท
burtenshawย 
posted an update about 1 month ago
view post
Post
1936
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft


plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly


2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)


5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)


if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

reasoning-course
  • 2 replies
ยท
burtenshawย 
posted an update about 1 month ago
view post
Post
2022
Hereโ€™s a notebook to make Gemma reason with GRPO & TRL. I made this whilst prepping the next unit of the reasoning course:

In this notebooks I combine together googleโ€™s model with some community tooling

- First, I load the model from the Hugging Face hub with transformersโ€™s latest release for Gemma 3
- I use PEFT and bitsandbytes to get it running on Colab
- Then, I took Will Browns processing and reward functions to make reasoning chains from GSM8k
- Finally, I used TRLโ€™s GRPOTrainer to train the model

Next step is to bring Unsloth AI in, then ship it in the reasoning course. Links to notebook below.

https://colab.research.google.com/drive/1Vkl69ytCS3bvOtV9_stRETMthlQXR4wX?usp=sharing
ยท
thomwolfย 
posted an update about 1 month ago
view post
Post
2810
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: โšก๏ธOlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming โ€“a domain Anthropic has been historically really strong atโ€“ and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
not-lainย 
posted an update about 1 month ago
m-ricย 
posted an update about 1 month ago
view post
Post
1034
Our new Agentic leaderboard is now live!๐Ÿ’ฅ

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. โœ…

๐Ÿ† GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. ๐Ÿ’ช

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)
  • 1 reply
ยท
burtenshawย 
posted an update about 1 month ago
view post
Post
3802
Iโ€™m super excited to work with @mlabonne to build the first practical example in the reasoning course.

๐Ÿ”— reasoning-course

Here's a quick walk through of the first drop of material that works toward the use case:

- a fundamental introduction to reinforcement learning. Answering questions like, โ€˜what is a reward?โ€™ and โ€˜how do we create an environment for a language model?โ€™

- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.

- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.

- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. Iโ€™m really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.

Maximeโ€™s work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.
burtenshawย 
posted an update about 2 months ago
view post
Post
5608
I made a real time voice agent with FastRTC, smolagents, and hugging face inference providers. Check it out in this space:

๐Ÿ”— burtenshaw/coworking_agent
ยท
burtenshawย 
posted an update about 2 months ago
view post
Post
6388
Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

๐Ÿ”— Follow the org for updates agents-course

This week we are releasing the first framework unit in the course and itโ€™s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .