ravi4198 (Ravi)

liked a model about 1 month ago

moondream/moondream-2b-2025-04-14-4bit

Image-Text-to-Text • 1B • Updated May 22 • 12.6k • 52

liked 4 models 3 months ago

liked 3 models 4 months ago

onnx-community/Phi-4-mini-instruct-ONNX-GQA

Text Generation • Updated Mar 1 • 76 • 4

microsoft/Phi-4-multimodal-instruct-onnx

Automatic Speech Recognition • Updated 20 days ago • 128 • 73

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 488k • 1.44k

reacted to smirki's post with 🔥 4 months ago

Post

3415

UIGEN for Tailwind v4 is coming soon!

2 replies

·

reacted to fdaudens's post with 👍 5 months ago

Post

2140

🔊 Meet Kokoro Web - Free, ML speech synthesis on your computer, that'll make you ditch paid services!

28 natural voices, unlimited generations, and WebGPU acceleration. Perfect for journalists and content creators.

Test it with full articles—sounds amazingly human! 🎯🎙️

Xenova/kokoro-web

reacted to hexgrad's post with 🔥 5 months ago

Post

7190

Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy

4 replies

·

reacted to Xenova's post with 🔥 5 months ago

Post

13551

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

11 replies

·

New activity in onnx-community/Kokoro-82M-v1.0-ONNX 5 months ago

Appreciation

👍 🤗 2

#1 opened 5 months ago by

ravi4198

liked a model 5 months ago

onnx-community/Kokoro-82M-v1.0-ONNX

Text-to-Speech • Updated Feb 8 • 121k • 106

upvoted a paper 5 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 235

updated a model 5 months ago

onnx-community/mms-tts-kan-ONNX

Text-to-Speech • Updated Jan 31 • 11

reacted to m-ric's post with 👍 5 months ago

Post

3456

Today we make the biggest release in smolagents so far: 𝘄𝗲 𝗲𝗻𝗮𝗯𝗹𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀, 𝘄𝗵𝗶𝗰𝗵 𝗮𝗹𝗹𝗼𝘄𝘀 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝘄𝗲𝗯 𝗯𝗿𝗼𝘄𝘀𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀! 🥳

Our agents can now casually open up a web browser, and navigate on it by scrolling, clicking elements on the webpage, going back, just like a user would.

The demo below shows Claude-3.5-Sonnet browsing GitHub for task: "Find how many commits the author of the current top trending repo did over last year."
Hi @mlabonne !

Go try it out, it's the most cracked agentic stuff I've seen in a while 🤯 (well, along with OpenAI's Operator who beat us by one day)

For more detail, read our announcement blog 👉 https://huggingface.co/blog/smolagents-can-see
The code for the web browser example is here 👉 https://github.com/huggingface/smolagents/blob/main/examples/vlm_web_browser.py

3 replies

·

reacted to onekq's post with 🔥 5 months ago

Post

2748

This is historical. 🎉

DeepSeek 🐋R1🐋 surpassed OpenAI 🍓o1🍓 on the dual leaderboard. What a year for the open source!

onekq-ai/WebApp1K-models-leaderboard

reacted to onekq's post with 🔥 5 months ago

Post

4877

🐋DeepSeek 🐋 is the real OpenAI 😯

6 replies

·

liked a model 5 months ago

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 555k • • 12.4k

Ravi

AI & ML interests

Recent Activity

Organizations

moondream/moondream-2b-2025-04-14-4bit

meta-llama/Llama-4-Scout-17B-16E-Instruct

meta-llama/Llama-4-Maverick-17B-128E

google/gemma-3-27b-it

microsoft/llava-rad

onnx-community/Phi-4-mini-instruct-ONNX-GQA

microsoft/Phi-4-multimodal-instruct-onnx

microsoft/Phi-4-multimodal-instruct

Appreciation

onnx-community/Kokoro-82M-v1.0-ONNX

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

onnx-community/mms-tts-kan-ONNX

deepseek-ai/DeepSeek-R1

Ravi

AI & ML interests

Recent Activity

Organizations

ravi4198's activity

Appreciation