AI & ML interests

https://github.com/huggingface/cookbook

sergiopaniegoΒ 
posted an update 24 minutes ago
sergiopaniegoΒ 
posted an update 3 days ago
sergiopaniegoΒ 
posted an update 5 days ago
sergiopaniegoΒ 
posted an update 11 days ago
view post
Post
2706
Meet OpenEnv πŸ‘‹, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently.

Ideal for training with TRL (we include examplesπŸ€“), deployment, and community collaboration via the HF Hub

Blog: https://huggingface.co/blog/openenv
Hub for Environments: openenv
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
Try it out using TRL: https://huggingface.co/docs/trl/main/en/openenv
  • 1 reply
Β·
merveΒ 
posted an update 14 days ago
view post
Post
4836
deepseek-ai/DeepSeek-OCR is out! πŸ”₯ my take ‡️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update 17 days ago
view post
Post
1902
New drop! πŸ’₯ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.

You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoΒ 
posted an update 17 days ago
view post
Post
858
New drop! πŸ’₯ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.



You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoΒ 
posted an update 19 days ago
view post
Post
2288
@Qwen released their new small and dense VLMs (Qwen3-VL).

They're incredibly capable and one of my all-time favourite VLMs.

πŸ€— We’ve prepared some resources to help you get started.

> Fine-tune Qwen3-VL-4B with SFT or GRPO (free Colab notebooks):
> SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb
> GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb

> Compare object detection vs. Moondream3:
sergiopaniego/vlm_object_understanding

> Fine-tune from the CLI using TRL:
https://github.com/kashif/Qwen3-VL/blob/trl-sft/qwen-vl-finetune/README.md#trl-based-training-single-gpu
sergiopaniegoΒ 
posted an update 24 days ago
view post
Post
1459
Super nice intro to fine-tuning with TRL, just dropped by @google (runs free on Colab)!

They use SFT + QLoRA to fine-tune the tiny Gemma 3 270M model for emoji generation

Here’s what the fine-tuned model generates for the prompt: β€œI'm learning to tweet” β†’ πŸ¦πŸ—£πŸ’»

Colab: https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Demos/Emoji-Gemma-on-Web/resources/Fine_tune_Gemma_3_270M_for_emoji_generation.ipynb
Try it out: google/emoji-gemma
Learn more: https://developers.googleblog.com/en/own-your-ai-fine-tune-gemma-3-270m-for-on-device/
sergiopaniegoΒ 
posted an update 27 days ago
view post
Post
2410
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.

TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚑, scale to multi-GPU/multi-node!

πŸ§‘β€πŸ³ recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update 28 days ago
view post
Post
2893
A few days ago, Thinking Machines Lab released β€œLoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret
sergiopaniegoΒ 
posted an update about 1 month ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
490
You need to try this tool! 🫑

My colleague @Molbap built an interactive HF Space to explore the modular support of open models in transformers over time

πŸ‘€ You’ll spot things like πŸ¦™ llama defining many models or which ones could be modular next

Try it: Molbap/transformers-modular-refactor
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
481
How fast can you create an endpoint in Hugging Face Inference Endpoints with a new model + vLLM to deploy a state-of-the-art OCR model?

Let’s break it down step by step.

1️⃣ Create your endpoint
Go to Hugging Face Endpoints β†’ + NEW
Select Deploy from Hub β†’ rednote-hilab/dots.ocr β†’ Configure πŸ› οΈ

2️⃣ Configure hardware & container
Pick hardware: AWS/GPU/L4 ⚑
Set container: vLLM πŸ‡
Click Create βœ…

3️⃣ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly β†’ Update
Advanced: add flag --trust-remote-code β†’ Update ⚠️

4️⃣ Run inference
Download the script πŸ“: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. βœ…

Your OCR model is now live via HF Inference Endpoints!
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
3462
πŸ’₯ Tons of new material just landed in the smol-course! πŸ§‘β€πŸ’»

> evaluation
> alignment
> VLMs
> quizzes
> assignments!
> certificates!πŸ‘©β€πŸŽ“

go learn! πŸ‘‰ https://huggingface.co/learn/smol-course/unit0/1
  • 1 reply
Β·
merveΒ 
posted an update about 1 month ago
view post
Post
6615
large AI labs open-sourced a ton of models last week πŸ”₯
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🀝
> IBM released a new Docling model with 258M params based on Granite (A2.0) πŸ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset πŸ’» OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash πŸ’­ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
1398
This summer TRL leveled up for multimodal alignment 🌞

βœ… New VLM alignment methods (MPO, GRPO, GSPO)
βœ… Extended RLOO & Online DPO for VLMs
βœ… Native SFT support
βœ… Ready-to-use training scripts

πŸ”— https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoΒ 
posted an update about 2 months ago
merveΒ 
posted an update about 2 months ago
view post
Post
3269
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face πŸ”₯

> not only a document converter but also can do document question answering, understand multiple languages 🀯
> best part: released with Apache 2.0 license πŸ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! πŸ€—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo πŸ’—
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
452
Training long-context LLMs is getting easier!

TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly πŸ’†
Combine TRL and accelerate, and you can run it effortlessly!

With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM πŸ“ˆ

Check out the full guide here πŸ‘‰ https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism

If you want to learn more about Context Parallelism, check out the Ultrascale Playbook πŸ‘‰ nanotron/ultrascale-playbook