Albert Villanova del Moral

albertvillanova

https://albertvillanova.github.io/

AI & ML interests

ML Engineer @ Hugging Face: Agents (Science)

Recent Activity

new activity 7 days ago

trl-internal-testing/tiny-Gemma3ForConditionalGeneration:Upload Gemma3ForConditionalGeneration

new activity 7 days ago

trl-internal-testing/tiny-Gemma3ForConditionalGeneration:Upload Gemma3ForConditionalGeneration

new activity 7 days ago

trl-internal-testing/tiny-Gemma3ForConditionalGeneration:Upload Gemma3ForConditionalGeneration

View all activity

Organizations

New activity in trl-internal-testing/tiny-Gemma3ForConditionalGeneration 7 days ago

Upload Gemma3ForConditionalGeneration

#10 opened 7 days ago by

albertvillanova

Upload Gemma3ForConditionalGeneration

#9 opened 7 days ago by

albertvillanova

Upload Gemma3ForConditionalGeneration

#8 opened 7 days ago by

albertvillanova

upvoted an article 9 days ago

Article

Running AI agents to automate outreach at scale

10 days ago

•

reacted to qgallouedec's post with 🔥🚀 10 days ago

Post

7879

TRL v1.3 ships day-one training support for Qwen 3.6 🚀

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()

So does GRPO tool-calling — just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0

New activity in trl-internal-testing/tiny-GptOssForCausalLM 15 days ago

Upload GptOssForCausalLM

#2 opened 15 days ago by

albertvillanova

Upload GptOssForCausalLM

#1 opened 15 days ago by

albertvillanova

upvoted an article about 1 month ago

Article

TRL v1.0: Post-Training Library Built to Move with the Field

Mar 31

•

updated a dataset about 2 months ago

albertvillanova/tmp-json-dtype

Viewer • Updated Mar 19 • 8 • 9

published a dataset about 2 months ago

albertvillanova/tmp-json-dtype

Viewer • Updated Mar 19 • 8 • 9

upvoted an article about 2 months ago

Article

Introducing Storage Buckets on the Hugging Face Hub

Mar 10

•

194

published an article about 2 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

Mar 10

•

142

posted an update 2 months ago

Post

2571

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0