✨Baidu & MiniMax both launched open foundation models - Baidu: Ernie 4.5 ( from 0.3B -424B ) 🤯 - MiniMax: MiniMax -M1 ( Hybrid MoE reasoning model )
✨Multimodal AI is moving from fusion to full-stack reasoning: unified Any-to-Any pipelines across text, vision, audio, and 3D - Baidu: ERNIE-4.5-VL-424B - Moonshot AI: Kimi-VL-A3B - Alibaba: Ovis-U1 - BAAI: Video-XL-2/OmniGen2 - AntGroup: Ming-Lite-Omni - Chinese Academy of Science: Stream-Omni - Bytedance: SeedVR2-3B - Tencent: Hunyuan 3D 2.1/ SongGeneration - FishAudio: Openaudio-s1-mini
✨Domain specific models are rapidly emerging - Alibaba DAMO: Lingshu-7B (medical MLLM) - BAAI: RoboBrain (Robotics)
✨ So many small models! - OpenBMB: MiciCPM4 ( on device ) - Qwen: Embedding/Reranker (0.6B) - Alibaba: Ovis-U1-3B - Moonshot AI: Kimi-VL-A3B - Bytedance: SeedVR2-3B
Inference for generative ai models looks like a mine field, but there’s a simple protocol for picking the best inference:
🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index
👷 fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/
This is what efficient AI looks like: Gemma 3n just dropped - a natively multimodal model that runs entirely on your device. No cloud. No API calls.
🧠 Text, image, audio, and video - handled locally. ⚡️Only needs 2B in GPU memory to run 🤯 First sub-10B model to hit 1300+ Elo ✅ Plug-and-play with Hugging Face, MLX, llama.cpp, and more.
Plus: Multilingual out of the box (140+ languages), fine-tune in a free Colab notebook.
It's been a bit since I took a step back and looked at xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? 🤗 700,000 users/orgs 📈 350,000 repos 🚀 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from commoncrawl was 471 TB.
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀)
Let me know if there's anything you're interested in; happy to dig in!