Hugging Face - Visual Blocks

community
Activity Feed

AI & ML interests

None defined yet.

hf-vb's activity

XenovaΒ 
posted an update 1 day ago
view post
Post
1644
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by πŸ€— Transformers.js. WebGPU support coming soon!
πŸ‘‰ npm i kokoro-js πŸ‘ˆ

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! πŸ€—

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! 🀯
  • 2 replies
Β·
XenovaΒ 
posted an update 17 days ago
view post
Post
6243
First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! 🀯

Try it out yourself! πŸ‘‡
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
XenovaΒ 
posted an update about 1 month ago
view post
Post
3913
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
πŸš€ Faster and more accurate than Whisper
πŸ”’ Privacy-focused (no data leaves your device)
⚑️ WebGPU accelerated (w/ WASM fallback)
πŸ”₯ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
Β·
XenovaΒ 
posted an update about 1 month ago
view post
Post
3061
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! πŸ”₯ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. πŸ€— Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
XenovaΒ 
posted an update about 2 months ago
view post
Post
3994
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🀯 Let's take a look:
πŸ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
πŸ‘οΈ Qwen2-VL from Qwen for dynamic-resolution image understanding
πŸ”’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
πŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
πŸ€Έβ€β™€οΈ ViTPose for pose estimation
πŸ“„ MGP-STR for optical character recognition (OCR)
πŸ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! πŸ”₯ Huge for privacy!

Check out the release notes for more information. πŸ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
XenovaΒ 
posted an update 2 months ago
view post
Post
5633
Have you tried out πŸ€— Transformers.js v3? Here are the new features:
⚑ WebGPU support (up to 100x faster than WASM)
πŸ”’ New quantization formats (dtypes)
πŸ› 120 supported architectures in total
πŸ“‚ 25 new example projects and templates
πŸ€– Over 1200 pre-converted models
🌐 Node.js (ESM + CJS), Deno, and Bun compatibility
🏑 A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 3 replies
Β·
XenovaΒ 
posted an update 5 months ago
view post
Post
13944
I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! 🀯 Since everything runs 100% locally, no messages are sent to a server β€” a huge win for privacy!
- πŸ€— Demo: webml-community/phi-3.5-webgpu
- πŸ§‘β€πŸ’» Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu
Β·
XenovaΒ 
posted an update 5 months ago
view post
Post
14984
I'm excited to announce that Transformers.js V3 is finally available on NPM! πŸ”₯ State-of-the-art Machine Learning for the web, now with WebGPU support! 🀯⚑️

Install it from NPM with:
πš—πš™πš– πš’ @πš‘πšžπšπšπš’πš—πšπšπšŠπšŒπšŽ/πšπš›πšŠπš—πšœπšπš˜πš›πš–πšŽπš›πšœ

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo: webml-community/segment-anything-webgpu
Β·
XenovaΒ 
posted an update 6 months ago
view post
Post
7993
Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to πŸ€— Transformers.js!

Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization
  • 1 reply
Β·
XenovaΒ 
posted an update 6 months ago
view post
Post
6816
Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to πŸ€— Transformers.js! Check it out!
πŸ‘‰ Xenova/whisper-word-level-timestamps πŸ‘ˆ

This unlocks a world of possibilities for in-browser video editing! 🀯 What will you build? 😍

Source code: https://github.com/xenova/transformers.js/tree/v3/examples/whisper-word-timestamps
  • 1 reply
Β·
XenovaΒ 
posted an update 6 months ago
XenovaΒ 
posted an update 7 months ago
view post
Post
6032
Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! πŸ€—πŸ€―

It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW!
- Demo: Xenova/florence2-webgpu
- Models: https://huggingface.co/models?library=transformers.js&other=florence2
- Source code: https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu
XenovaΒ 
posted an update 7 months ago
view post
Post
10268
Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! πŸš€ It supports multilingual transcription and translation across 100 languages! 🀯

The model runs locally, meaning no data leaves your device! 😍

Check it out! πŸ‘‡
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu
Β·
radamesΒ 
posted an update 8 months ago
view post
Post
5949
Thanks to @OzzyGT for pushing the new Anyline preprocessor to https://github.com/huggingface/controlnet_aux. Now you can use the TheMistoAI/MistoLine ControlNet with Diffusers completely.

Here's a demo for you: radames/MistoLine-ControlNet-demo
Super resolution version: radames/Enhance-This-HiDiffusion-SDXL

from controlnet_aux import AnylineDetector

anyline = AnylineDetector.from_pretrained(
    "TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline"
).to("cuda")

source = Image.open("source.png")
result = anyline(source, detect_resolution=1280)
radamesΒ 
updated a Space 8 months ago
radamesΒ 
posted an update 8 months ago
view post
Post
6717
At Google I/O 2024, we're collaborating with the Google Visual Blocks team (https://visualblocks.withgoogle.com) to release custom Hugging Face nodes. Visual Blocks for ML is a browser-based tool that allows users to create machine learning pipelines using a visual interface. We're launching nodes with Transformers.js, running models on the browser, as well as server-side nodes running Transformers pipeline tasks and LLMs using our hosted inference. With @Xenova @JasonMayes

You can learn more about it here https://huggingface.co/blog/radames/hugging-face-google-visual-blocks

Source-code for the custom nodes:
https://github.com/huggingface/visual-blocks-custom-components
radamesΒ 
posted an update 8 months ago
view post
Post
2016
AI-town now runs on Hugging Face Spaces with our API for LLMs and embeddings, including the open-source Convex backend, all in one container. Easy to duplicate and config on your own

Demo: radames/ai-town
Instructions: https://github.com/radames/ai-town-huggingface
Β·
XenovaΒ 
posted an update 8 months ago
view post
Post
11503
Introducing Phi-3 WebGPU, a private and powerful AI chatbot that runs 100% locally in your browser, powered by πŸ€— Transformers.js and onnxruntime-web!

πŸ”’ On-device inference: no data sent to a server
⚑️ WebGPU-accelerated (> 20 t/s)
πŸ“₯ Model downloaded once and cached

Try it out: Xenova/experimental-phi3-webgpu
Β·
radamesΒ 
posted an update 8 months ago
view post
Post
2533
HiDiffusion SDXL now supports Image-to-Image, so I've created an "Enhance This" version using the latest ControlNet Line Art model called MistoLine. It's faster than DemoFusion

Demo: radames/Enhance-This-HiDiffusion-SDXL

Older version based on DemoFusion radames/Enhance-This-DemoFusion-SDXL

New Controlnet SDXL Controls Every Line TheMistoAI/MistoLine

HiDiffusion is compatible with diffusers and support many SD models - https://github.com/megvii-research/HiDiffusion
  • 1 reply
Β·
radamesΒ 
posted an update 9 months ago
view post
Post
2454
I've built a custom component that integrates Rerun web viewer with Gradio, making it easier to share your demos as Gradio apps.

Basic snippet
# pip install gradio_rerun gradio
import gradio as gr
from gradio_rerun import Rerun

gr.Interface(
    inputs=gr.File(file_count="multiple", type="filepath"),
    outputs=Rerun(height=900),
    fn=lambda file_path: file_path,
).launch()

More details here radames/gradio_rerun
Source https://github.com/radames/gradio-rerun-viewer

Follow Rerun here https://huggingface.co/rerun