Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific ๐ถ๐บ๐ฎ๐ด๐ฒ ๐ฐ๐น๐ฎ๐๐๐ถ๐ณ๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST for experimental testing. ๐งคโ๏ธ
Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers ๐ค.
Play with Orpheus TTS, a Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been fine-tuned to deliver human-level speech synthesis ๐ฅ๐ฃ๏ธ
If you've been following along with the Xet Team's (https://huggingface.co/xet-team) work, you know we've been working to migrate the Hugging Face Hub from Git LFS and to Xet.
Recently, we launched a waitlist to join the movement to Xet (join here! https://huggingface.co/join/xet ) but getting to this point was a journey.
From the initial proof of concept in August, to launching on the Hub internally, to migrating a set of repositories and routing a small chunk of download traffic on the Hub through our infrastructure. Every step of the way has been full of challenges, big and small, and well worth the effort.
Over the past few weeks, with real traffic flowing through our services weโve tackled some truly gnarly issues (unusual upload/download patterns, memory leaks, load imbalances, and more) and resolved each without major disruptions.
If you're curious about how this sliver of Hub infrastructure looks as we routed traffic through it for the first time (and want a deep dive full of Grafana and Kibana charts ๐ค) I have a post for you.
Here's an inside look into the day of our first migrations and the weeks following, where we pieced together solutions in real time.
Introducing ๐ OneSQL-v0.1๐ฅณ, our first text-to-SQL model based on Qwen2.5-Coder. This model has achieved an EX score of 63.33 on the BIRD leaderboard (https://bird-bench.github.io/).
My goal is to make OneSQL the most usable open-weights model for text-to-SQL. I'm currently working on best practices to help users use this model the right away and avoid pitfalls. After that, I plan to train the next version to push for a higher EX score.
Enjoy this model and feel free to share comments/questions ๐ค
Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.
Here is a list of 15 types of attention mechanisms used in AI models:
3. Self-attention -> Attention Is All You Need (1706.03762) Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.
5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762) Multiple attention โheadsโ are run in parallel.โ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.
Page : https://huggingface.co/strangerzonehf Describe the artistic properties by posting sample images or links to similar images in the request discussion. If the adapters you're asking for are truly creative and safe for work, I'll train and upload the LoRA to the Stranger Zone repo!
Diffusion models are widely used for image and video generation but remain underexplored in text generation, where autoregressive models (ARMs) dominate. Unlike ARMs, which produce tokens sequentially, diffusion models iteratively refine noise through denoising steps, offering greater flexibility and speed. Recent advancements show a shift toward using diffusion models in place of, or alongside, ARMs. Researchers also combine strengths from both methods and integrate autoregressive concepts into diffusion.
Here are 5 new implementations of diffusion models:
1. Mercury family of diffusion LLMs (dLLMs) by Inception Labs -> https://www.inceptionlabs.ai/news It applies diffusion to text and code data, enabling sequence generation 10x faster than today's top LLMs. Now available Mercury Coder can run at over 1,000 tokens/sec on NVIDIA H100s.
3. LLaDA -> Large Language Diffusion Models (2502.09992) Shows diffusion models' potential in replacing ARMs. Trained with pre-training and SFT, LLaDA masks tokens, predicts them via a Transformer, and optimizes a likelihood bound. LLaDA matches key LLM skills, and surpasses GPT-4o in reversal poetry.
5. General Interpolating Discrete Diffusion (GIDD) -> Generalized Interpolating Discrete Diffusion (2503.04482) A flexible noising process with a novel diffusion ELBO enables combining masking and uniform noise, allowing diffusion models to correct mistakes, where ARMs struggle.
I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use โHUGGINGFACEโ to get access!
๐ Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. ๐ฆพ๐
Here's why this is a game-changer for agent-based systems: ๐งต๐
1๏ธโฃ Security First ๐ Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.
2๏ธโฃ Deterministic & Reproducible Runs ๐ฆ By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable settingโno more environment mismatches or dependency issues!
3๏ธโฃ Resource Control & Limits ๐ฆ Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents donโt spiral out of control.
4๏ธโฃ Safer Code Execution in Production ๐ญ Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.
5๏ธโฃ Easy to Integrate ๐ ๏ธ With smolagents, you can simply configure your agent to use Docker or E2B as its execution backendโno need for complex security setups!
6๏ธโฃ Perfect for Autonomous AI Agents ๐ค If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.
Here's a quick walk through of the first drop of material that works toward the use case:
- a fundamental introduction to reinforcement learning. Answering questions like, โwhat is a reward?โ and โhow do we create an environment for a language model?โ
- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.
- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.
- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. Iโm really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.
Maximeโs work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.
This week we are releasing the first framework unit in the course and itโs on smolagents. This is what the unit covers:
- why should you use smolagents vs another library? - how to build agents that use code - build multiagents systems - use vision language models for browser use
The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric.