👀 Multimodal > Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS) > with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS) > SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants > SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)
💬 LLMs > NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset > LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B > Dataset: Glaive AI released a new reasoning dataset of 22M+ examples > Dataset: NVIDIA released new helpfulness dataset HelpSteer3 > Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS) > Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B > Dataset: GeneralThought-430K is a new reasoning dataset (OS)
🖼️ Image Generation/Computer Vision > Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥 > YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹 > Stability AI released Stable Virtual Camera, a new novel view synthesis model > Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model > ByteDance released InfiniteYou, new realistic photo generation model > StarVector is a new 8B model that generates svg from images > FlexWorld is a new model that expands 3D views (OS)
🎤 Audio > Sesame released CSM-1B new speech generation model (OS)
🤖 Robotics > NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset
As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.
We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!
It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!
And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3
We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger 💪
Together with the models, we are releasing:
📊CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots
🏆 IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi
If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova, this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. ✅
🏆 GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!
The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. 💪
(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)
I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).
He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.
As a thank you to the community, he shared 100 invite code first-come first serve, just use “HUGGINGFACE” to get access!
Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs
- 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Covers 23 languages, excels in image captioning, VQA & more - Integrated on transformers from Day 0!