LLHF

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

llhf's activity

Wauplin 
posted an update 15 days ago
view post
Post
2054
‼️ huggingface_hub's v0.30.0 is out with our biggest update of the past two years!

Full release notes: https://github.com/huggingface/huggingface_hub/releases/tag/v0.30.0.

🚀 Ready. Xet. Go!

Xet is a groundbreaking new protocol for storing large objects in Git repositories, designed to replace Git LFS. Unlike LFS, which deduplicates files, Xet operates at the chunk level—making it a game-changer for AI builders collaborating on massive models and datasets. Our Python integration is powered by [xet-core](https://github.com/huggingface/xet-core), a Rust-based package that handles all the low-level details.

You can start using Xet today by installing the optional dependency:

pip install -U huggingface_hub[hf_xet]


With that, you can seamlessly download files from Xet-enabled repositories! And don’t worry—everything remains fully backward-compatible if you’re not ready to upgrade yet.

Blog post: https://huggingface.co/blog/xet-on-the-hub
Docs: https://huggingface.co/docs/hub/en/storage-backends#xet


⚡ Inference Providers

- We’re thrilled to introduce Cerebras and Cohere as official inference providers! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models.

- Novita is now our 3rd provider to support text-to-video task after Fal.ai and Replicate.

- Centralized billing: manage your budget and set team-wide spending limits for Inference Providers! Available to all Enterprise Hub organizations.

from huggingface_hub import InferenceClient
client = InferenceClient(provider="fal-ai", bill_to="my-cool-company")
image = client.text_to_image(
    "A majestic lion in a fantasy forest",
    model="black-forest-labs/FLUX.1-schnell",
)
image.save("lion.png")


- No more timeouts when generating videos, thanks to async calls. Available right now for Fal.ai, expecting more providers to leverage the same structure very soon!
·
clefourrier 
posted an update about 1 month ago
view post
Post
2171
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.
alvarobartt 
posted an update about 2 months ago
view post
Post
3024
🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)
lysandre 
posted an update about 2 months ago
view post
Post
6270
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
·
Xenova 
posted an update 2 months ago
view post
Post
12918
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?
·
victor 
posted an update 2 months ago
view post
Post
5431
Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!
·
victor 
posted an update 3 months ago
view post
Post
3107
Finally, an open-source AI that turns your lyrics into full songs is here—meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot
Xenova 
posted an update 3 months ago
view post
Post
7016
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by 🤗 Transformers.js. WebGPU support coming soon!
👉 npm i kokoro-js 👈

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! 🤗

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! 🤯
·
Xenova 
posted an update 3 months ago
Xenova 
posted an update 4 months ago
view post
Post
4319
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
🚀 Faster and more accurate than Whisper
🔒 Privacy-focused (no data leaves your device)
⚡️ WebGPU accelerated (w/ WASM fallback)
🔥 Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
·
Xenova 
posted an update 4 months ago
view post
Post
3338
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! 🔥 High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. 🤗 Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)