32 230 692

Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Recent Activity

reacted to Xenova's post with 🔥 about 13 hours ago

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯 🗣️ Transcribe videos, meeting notes, songs and more 🔐 Runs on-device, meaning no data is sent to a server 🌎 Multilingual (8 languages) 🤗 Completely free (forever) & open source That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥 Try it out yourself! 👇 https://huggingface.co/spaces/webml-community/Voxtral-WebGPU

liked a Space about 14 hours ago

rajatarya/hf-news

liked a model 2 days ago

zai-org/GLM-4.5

View all activity

Organizations

reacted to Xenova's post with 🔥 about 13 hours ago

Post

2157

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU

liked a Space about 14 hours ago

HF News (AI-assisted summary)

📰

Stay updated with AI/ML news, summarized by AI

liked 2 models 2 days ago

zai-org/GLM-4.5

Text Generation • 358B • Updated 3 days ago • 6.13k • • 775

Qwen/Qwen3-Coder-480B-A35B-Instruct

Text Generation • 480B • Updated 7 days ago • 20.1k • • 932

upvoted an article 3 days ago

Article

Vibe coding for data science: how to label a dataset with Kimi K2

•

9 days ago

• 19

liked a Space 5 days ago

Voxtral WebGPU

🐱

State-of-the-art audio transcription in your browser

reacted to merve's post with 👍 8 days ago

Post

2730

Now it's possible to do RAG with any-to-any models 🔥

Learn how to search in a video dataset and generate using Tevatron/OmniEmbed-v0.1-multivent an all modality retriever, and Qwen/Qwen2.5-Omni-7B, any-to-any model in this notebook 🤝 merve/smol-vision

liked a model 10 days ago

Tevatron/OmniEmbed-v0.1

Visual Document Retrieval • Updated 14 days ago • 2.93k • 20

posted an update 13 days ago

Post

2078

AudioRAG is becoming real! Just built a demo with ColQwen-Omni that does semantic search on raw audio, no transcription needed.

Drop in a podcast, ask your question, and it finds the exact chunks where it happens. You can also get a written answer.

What’s exciting: it skips transcription, making it faster and better at capturing emotion, ambient sound, and tone, surfacing results text search would miss.

- Demo: fdaudens/colqwen-omni-demo
- Blog post from ColQwen team: https://huggingface.co/blog/manu/colqwen-omni-omnimodal-retrieval

1 reply

updated a Space 13 days ago

AudioRag Demo

🎵

Search audio files for specific queries

liked a Space 13 days ago

221

Whisper WebGPU

🎤

Convert spoken words to text

liked a model 14 days ago

vidore/colqwen-omni-v0.1

Visual Document Retrieval • Updated 14 days ago • 3.74k • 82

published a Space 14 days ago

AudioRag Demo

🎵

Search audio files for specific queries

upvoted 2 articles 14 days ago

Article

Five Big Improvements to Gradio MCP Servers

•

14 days ago

• 18

Article

Introducing ColQwen-Omni: Retrieve in every modality

and 4 others •

14 days ago

• 58

upvoted an article 15 days ago

Article

Experimenting with Automatic PII Detection on the Hub using Presidio

and 3 others •

Jul 10, 2024

• 25

upvoted a collection 15 days ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated May 5 • 95

upvoted an article 16 days ago

Article

What is the Hugging Face Community Building?

and 2 others •

16 days ago

• 11

posted an update 16 days ago

Post

2485

You might not have heard of Moonshot AI — but within 24 hours, their new model Kimi K2 shot to the top of Hugging Face’s trending leaderboard.

So… who are they, and why does it matter?

Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.

🧵 A few standout facts:

1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.

2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI — still a rare ambition among Chinese AI labs.

3. A trillion-parameter model that’s surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.

4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.

Most importantly, their move from closed to open source signals a broader shift in China’s AI scene — following Baidu’s pivot. But as Yang puts it: “Users are the only real leaderboard.”

👇 Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained

published an article 16 days ago

Article

5 Things You Need to Know About Moonshot AI and Kimi K2, the New #1 model on the Hub

and 1 other •

16 days ago

• 21

Florent Daudens

AI & ML interests

Recent Activity

Organizations

fdaudens's activity

HF News (AI-assisted summary)

Vibe coding for data science: how to label a dataset with Kimi K2

Voxtral WebGPU

AudioRag Demo

Whisper WebGPU

AudioRag Demo

Five Big Improvements to Gradio MCP Servers

Introducing ColQwen-Omni: Retrieve in every modality

Experimenting with Automatic PII Detection on the Hub using Presidio

What is the Hugging Face Community Building?

5 Things You Need to Know About Moonshot AI and Kimi K2, the New #1 model on the Hub