AI & ML interests

The AI community building the future.

Recent Activity

Articles

Chunte 
in huggingface/badges about 1 hour ago

Use with MCP markdown update

#36 opened about 1 hour ago by
Chunte
Chunte 
in huggingface/badges about 2 hours ago

Use with MCP badge assets

#35 opened about 2 hours ago by
Chunte

Add `deploy-smollm3` images

#520 opened about 6 hours ago by
juanjucm

Add `deploy-smollm3` images

#520 opened about 6 hours ago by
juanjucm
jsulz 
posted an update 1 day ago
view post
Post
948
We've moved over 20PB from Git LFS to Xet on the Hub without downtime or data loss. Having things "just work" on a migration of this scale is about as good as it gets.

Now, we're migrating the rest of the Hub https://huggingface.co/blog/migrating-the-hub-to-xet

But how did we get here?

In the early days of joining Hugging Face, we made a few key design decisions:
* There would be no "hard cut-over" from Git LFS to Xet
* A Xet-enabled repository should be able to contain both Xet and LFS files
* Repository migrations from LFS to Xet can run in the background without disrupting downloads or uploads

These were largely driven by our desire to ensure the community could keep working without interruption.

We cover the infrastructure making this all go in this post, specifically:
* An integral piece of infrastructure known internally as the Git LFS Bridge
* Background content migrations that run around the clock

To skip the wait and join Xet now, sign up here https://huggingface.co/join/xet
pagezyhf 
posted an update 2 days ago
view post
Post
135
🎉 New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2

We're excited to welcome Parakeet TDT 0.6B V2—a state-of-the-art English speech-to-text model—to the Azure Foundry Model Catalog.

What is it?

A powerful ASR model built on the FastConformer-TDT architecture, offering:
🕒 Word-level timestamps
✍️ Automatic punctuation & capitalization
🔊 Strong performance across noisy and real-world audio

It runs with NeMo, NVIDIA’s optimized inference engine.

Want to give it a try? 🎧 You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!

📘 Learn more by following this guide written by @alvarobartt

https://huggingface.co/docs/microsoft-azure/azure-ai/examples/deploy-nvidia-parakeet-asr
AdinaY 
posted an update 5 days ago
view post
Post
3251
Kimi-K2 is now available on the hub🔥🚀
This is a trillion-parameter MoE model focused on long context, code, reasoning, and agentic behavior.

moonshotai/kimi-k2-6871243b990f2af5ba60617d

✨ Base & Instruct
✨ 1T total / 32B active - Modified MIT License
✨ 128K context length
✨ Muon optimizer for stable trillion-scale training
  • 1 reply
·
albertvillanova 
posted an update 5 days ago
view post
Post
306
🚀 New in smolagents v1.20.0: Remote Python Execution via WebAssembly (Wasm)

We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!

🔧 Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.

Why this matters:
✅ Isolated execution = no host access
✅ No need for Python on the user's machine
✅ Safer evaluation of arbitrary code
✅ Compatible with serverless / edge agent workloads
✅ Ideal for constrained or untrusted environments

This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. 💡

💡 We're inviting the open-source community to help evolve this executor:
• Tackle more advanced Python features
• Expand compatibility
• Add test coverage
• Shape the next-gen secure agent runtime

🔗 Check out the PR: https://github.com/huggingface/smolagents/pull/1261

Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.

This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together 🧠⚙️

👉 https://github.com/huggingface/smolagents/releases/tag/v1.20.0

#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI
dylanebert 
posted an update 7 days ago
view post
Post
2229
dylanebert/3d-arena now supports topology-only voting and ranking!

Let's see which Gen3D model produces the best topology
pagezyhf 
posted an update 7 days ago
view post
Post
1223
If you want to dive into how the HF team worked with @seungrokj at @AMD
to optimize kernels on MI300, you should give a read to our latest blog!

Such a great educational material for anyone curious about the world of optimizing low level ML.

https://huggingface.co/blog/mi300kernels
a-r-r-o-w 
posted an update 8 days ago
view post
Post
3061
Caching is an essential technique used in diffusion inference serving for speeding up image/video generations. Diffusers just added support for another caching method: First Block Cache - a technique developed by @chengzeyi building upon the ideas of TeaCache.

The idea in short is: if the model predictions do not vary much over successive inference steps, we can skip certain steps where the prediction difference is small. To figure out whether an inference step will make a significant improvement to the overall velocity/noise prediction, we calculate the relative difference of the output of the first transformer block at timestep $t$ with $t-1$, and compare it against a selected threshold. If the difference is lower than the threshold, we skip the step. A higher threshold will lead to more steps being skipped. However, skipping many steps is bad because it can throw off the model predictions, and so we need to test and select the threshold based on level of quality-speed tradeoff for every model we use it with.

Diffusers usage with CogView4:

import torch
from diffusers import CogView4Pipeline
from diffusers.hooks import apply_first_block_cache, FirstBlockCacheConfig

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16)
pipe.to("cuda")

apply_first_block_cache(pipe.transformer, FirstBlockCacheConfig(threshold=0.2))

prompt = "A photo of an astronaut riding a horse on mars"
image = pipe(prompt, generator=torch.Generator().manual_seed(42)).images[0]
image.save("output.png")


Below, you'll find the benchmarks and visualizations of the predicted output at different blocks of the Flux DiT.

Docs: https://huggingface.co/docs/diffusers/main/en/optimization/cache
PR: https://github.com/huggingface/diffusers/pull/11180

References:
- First Block Cache: https://github.com/chengzeyi/ParaAttention
- TeaCache: https://github.com/ali-vilab/TeaCache
AdinaY 
posted an update 8 days ago
view post
Post
461
The tech report of RoboBrain 2.0 is now available on the Daily Papers page🔥

It's an embedded brain model that sees, thinks, and plans for many robots.

Leave your insights or questions, the authors are happy to respond.
RoboBrain 2.0 Technical Report (2507.02029)
AdinaY 
posted an update 8 days ago
AdinaY 
posted an update 8 days ago
view post
Post
270
POLAR🐻‍❄️ New reward modeling by Shanghai AI Lab

internlm/polar-68693f829d2e83ac5e6e124a

✨ 1.8B/7B - Apache 2.0
✨ Scalable policy discriminative pretraining
✨ Easy RLHF with minimal preference data
giadap 
posted an update 8 days ago
view post
Post
2232
I've been posting bits and pieces about this research, but now I can finally say: new paper alert 🚨

My colleague @brunatrevelin and I just shared a paper exploring why traditional consent frameworks are breaking down in AI contexts (forthcoming chapter in a collective book).

The current model places impossible burdens on users to manage countless consent decisions. Meanwhile, AI systems learn to mimic our voices and writing styles from data we unknowingly provided years ago.

What's next? We need to shift from individual responsibility to collective accountability.

This means:
- Organizations designing systems that respect human agency by default
- Developers building ethics into models from the start
- Policymakers creating frameworks beyond minimal compliance

Blog post: https://huggingface.co/blog/giadap/consentful-ai
Paper: Can AI be Consentful? (2507.01051)
  • 2 replies
·
andito 
posted an update 13 days ago
view post
Post
3898
🧠👁️ Can AI visualize solutions?

Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”?

That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.

These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.

🔧 Mirage is trained in two phases:

1) Grounding: It learns to produce latent tokens anchored in real images.
2) Refinement: The model drops the images and learns to generate visual tokens on its own.

📈 And yes, it works!
On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines.
Smart sketches > empty words.

By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like.
Kudos to the teams at UMass Amherst and MIT behind this exciting work.
Check the paper: Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (2506.17218)
·
AdinaY 
posted an update 13 days ago
view post
Post
1953
The Chinese Open Source Heatmap is live 🔥
You can now track the companies/ research labs/ communities powering China’s open source AI movement.

zh-ai-community/model-release-heatmap-zh

Some highlights:

✨Giant Tech are investing more in open source.
-Alibaba: Full stack open ecosystem
-Tecent: Hunyuan image/video/3D
-Bytedance: Catching up fast in 2025
-Baidu: New player in open LLM

✨New players emerging post–DeepSeek moment.
-Xiaomi
-Red Note
-Bilibili
-MiniMax
-Moonshot AI

✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure.
-DeepSeek
-MiniMax
-StepFun
-Moonshot AI
-Zhipu AI
-OpenBMB

✨Research Lab & Community are making key contributions.
-BAAI
-Shanghai AI Lab
-OpenMOSS
-MAP
AdinaY 
posted an update 14 days ago
view post
Post
3329
🔥 June highlights from China’s open source ecosystem.

zh-ai-community/june-2025-open-works-from-the-chinese-community-683d66c188f782dc5570ba15

✨Baidu & MiniMax both launched open foundation models
- Baidu: Ernie 4.5 ( from 0.3B -424B ) 🤯
- MiniMax: MiniMax -M1 ( Hybrid MoE reasoning model )

✨Multimodal AI is moving from fusion to full-stack reasoning: unified Any-to-Any pipelines across text, vision, audio, and 3D
- Baidu: ERNIE-4.5-VL-424B
- Moonshot AI: Kimi-VL-A3B
- Alibaba: Ovis-U1
- BAAI: Video-XL-2/OmniGen2
- AntGroup: Ming-Lite-Omni
- Chinese Academy of Science: Stream-Omni
- Bytedance: SeedVR2-3B
- Tencent: Hunyuan 3D 2.1/ SongGeneration
- FishAudio: Openaudio-s1-mini

✨Domain specific models are rapidly emerging
- Alibaba DAMO: Lingshu-7B (medical MLLM)
- BAAI: RoboBrain (Robotics)

✨ So many small models!
- OpenBMB: MiciCPM4 ( on device )
- Qwen: Embedding/Reranker (0.6B)
- Alibaba: Ovis-U1-3B
- Moonshot AI: Kimi-VL-A3B
- Bytedance: SeedVR2-3B