Hey all Finally it's happening. DeepGit lite is back now, running on cpu only devices. Just smartly search across Github and spin up conversational agents in the background and have grounded conversation with repositories Try it out now!!!! zamal/DeepGit
In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!
@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.
We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. You’re just an issue away on our GitHub repo!
Dataset Viewer for PDFs just landed on Hugging Face 📖🤗 you can now preview all the PDFs easier than before!
on top of this, there's PdfFolder format to load the PDF datasets quicker 💨 > to use it, your dataset should follow a directory format like folder/train/doc1.pdf, folder/train/doc1.pdf > if you want to include bounding boxes, labels etc. you can keep them in a metadata.csv file in the same folder 🤝
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
🚀 SmolAgents v1.19.0 is live! This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:
🔧 Agent Upgrades - Support for managed agents in ToolCallingAgent - Context manager support for cleaner agent lifecycle handling - Output formatting now uses XML tags for consistency
🖥️ UI Enhancements - GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.
🔄 Streaming Refactor - Streaming event aggregation moved off the Model class - ➡️ Better architecture & maintainability
📦 Output Tracking - CodeAgent outputs are now stored in ActionStep - ✅ More visibility and structure to agent decisions
🐛 Bug Fixes - Smarter planning logic - Cleaner Docker logs - Better prompt formatting for additional_args - Safer internal functions and final answer matching
📚 Docs Improvements - Added quickstart examples with tool usage - One-click Colab launch buttons - Expanded reference docs (AgentMemory, GradioUI docstrings) - Fixed broken links and migrated to .md format
🖼️ VLMs/OCR > moonshotai/Kimi-VL-A3B-Thinking-2506 is a powerful reasoning vision LM, 3B active params, smarter with less tokens, supports long documents, videos 👏 (OS) > nanonets/Nanonets-OCR-s is 3.75B params OCR model based on Qwen2.5VL-3B-Instruct (OS)
🗣️ Audio > Google released google/magenta-realtime, real time music generation & audio synthesis (cc-by-4) > kyutai released new speech-to-text models that come in 1B & 2B (kyutai/stt-1b-en_fr, stt-2b-en_fr) with 0.5s and 2.5s delay
y'all have been asking my opinion on how OCR models compare to each other 👀 I will leave three apps to compare newest models by @prithivMLmods instead ⤵️ > compare Nanonets-OCR-s, Qwen2-VL-OCR-2B-Instruct, RolmOCR, Aya-Vision prithivMLmods/Multimodal-OCR > SmolDocling, Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B prithivMLmods/Multimodal-OCR2 > docscopeOCR, MonkeyOCR, coreOCR prithivMLmods/core-OCR
so far I figured out > for fact-checks, you need a relatively bigger size (7B is ok!) > Gemma 3 gets downgrade without pan and scan (especially for 📑) > Qwen2.5VL-32B is very talkative, great for reasoning but not good for simple tasks 🗣️
Build your first chatbot with a Hugging Face Spaces frontend and Gaudi-powered backend with @bconsolvo ! He will teach you how to build an LLM-powered chatbot using Streamlit and Hugging Face Spaces—integrating a model endpoint hosted on an Intel® Gaudi® accelerator.
the method is simple: find which tokens have the highest attention score, merge rest of the tokens based on similarity, then merge both
their method is both training-free and for fine-tuning the authors report 5 point improvement on average of vision language tasks + 8x improvement in prefilling time for Llava-Next 7B and 13B 🤯
removing redundant tokens improve image token quality too 🥹
we have launched Kernel Hub: easy optimized kernels for all models on Hugging Face 🔥 use them right away! it's where the community populates optimized kernels 🤝
this release comes in three parts > Kernel Hub: contains (as of now) 14 kernels > kernels: Python library to load kernels from Kernel Hub > kernel-builder: Nix package to build kernels for PyTorch (made using PyTorch C++ frontend)
when building models, your regular workflow should be pulling kernels from Hub and building your model with them 🤗 here's a practical example with RMSNorm: 1. pull the kernel from Hub with get_kernel 2. decorate with use_kernel_forward_from_hub 3. inject it to your model we'd love to hear your feedback! 🙏🏻 we also welcome kernel contributions by community 🥹💗