IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥
> not only a document converter but also can do document question answering, understand multiple languages 🤯 > best part: released with Apache 2.0 license 👏 use it with your commercial projects! > it supports transformers, vLLM and MLX from the get-go! 🤗 > built on SigLIP2 & granite-165M
Tremendous quality of life upgrade on the Hugging Face Hub - we now have auto-complete emojis 🤗 🥳 👏 🙌 🎉
Get ready for lots more very serious analysis on a whole range of topics from yours truly now that we have unlocked this full range of expression 😄 🤔 🗣 🙊
Motif 2.6B tech report is pretty insane, first time i see a model with differential attention and polynorm trained at scale!
> It's trained on 2.5T of token, with a "data mixture schedule" to continuously adjust the mixture over training. > They use WSD with a "Simple moving average" averaging the last 6 ckpt every 8B token. > They trained on Finemath, Fineweb2, DCLM, TxT360. > Lot of details in the finetuning data they used, for instance they used EvolKit and did some "dataset fusion" to have more compressed knowledge into the data. > They mention they also tried Normalized GPT, QK-Norm and Cross Layer Attention.
🚀 smolagents v1.21.0 is here! Now with improved safety in the local Python executor: dunder calls are blocked! ⚠️ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm. ✨ Many bug fixes: more reliable code. 👉 https://github.com/huggingface/smolagents/releases/tag/v1.21.0