not-lain (Hafedh Hichri)

reacted to cbensimon's post with 🚀🔥 22 days ago

Post

3139

🚀 ZeroGPU now supports PyTorch native quantization via torchao

While it hasn’t been battle-tested yet, Int8WeightOnlyConfig is already working flawlessly in our tests.

Let us know if you run into any issues — and we’re excited to see what the community will build!

import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_

pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)

@spaces.GPU
def generate(prompt: str):
    return pipeline(prompt).images[0]

5 replies

·

reacted to danielhanchen's post with 🔥 30 days ago

Post

3322

New DeepSeek-R1-0528 1.65-bit Dynamic GGUF!

Run the model locally even easier! Will fit on a 192GB Macbook and run at 7 tokens/s.

DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-GGUF
Qwen3-8B DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF

And read our Guide: https://docs.unsloth.ai/basics/deepseek-r1-0528

reacted to abidlabs's post with ❤️ 2 months ago

Post

5018

HOW TO ADD MCP SUPPORT TO ANY 🤗 SPACE

Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:

1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio sdk_version to 5.28 (in the README.md)
3. Set mcp_server=True in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:

def generate(text, speed=1):
    """
    Convert text to speech audio.

    Parameters:
        text (str): The input text to be converted to speech.
        speed (float, optional): Playback speed of the generated speech.

That's it! Now your LLM will be able to talk to you 🤯

reacted to clem's post with 🤗 2 months ago

Post

2994

Just crossed half a million public apps on Hugging Face. A new public app is created every minute these days 🤯🤯🤯

What's your favorite? http://hf.co/spaces

3 replies

·

reacted to danielhanchen's post with 🚀❤️🔥🤗 3 months ago

Post

5023

You can now run Llama 4 on your own local device! 🦙
Run our Dynamic 1.78-bit and 2.71-bit Llama 4 GGUFs:
unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-llama-4

1 reply

·

reacted to as-cle-bert's post with 🔥 3 months ago

Post

2972

Llama-4 is out and I couldn't resist but to cook something with it... So I came up with 𝐋𝐥𝐚𝐦𝐚𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡𝐞𝐫 (https://llamaresearcher.com), your deep-research AI companion!🔎

The workflow behind 𝗟𝗹𝗮𝗺𝗮𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿 is simple:
💬 You submit a query
🛡️ Your query is evaluated by Llama 3 guard model, which deems it safe or unsafe
🧠 If your query is safe, it is routed to the Researcher Agent
⚙️ The Researcher Agent expands the query into three sub-queries, with which to search the web
🌐 The web is searched for each of the sub-queries
📊 The retrieved information is evaluated for relevancy against your original query
✍️ The Researcher Agent produces an essay based on the information it gathered, paying attention to referencing its sources

The agent itself is also built with easy-to-use and intuitive blocks:
🦙 LlamaIndex provides the agentic architecture and the integrations with the language models
⚡Groq makes Llama-4 available with its lightning-fast inference
🔎 Linkup allows the agent to deep-search the web and provides sourced answers
💪 FastAPI does the heavy loading with wrapping everything within an elegant API interface
⏱️ Redis is used for API rate limiting
🎨 Gradio creates a simple but powerful user interface

Special mention also to Lovable, which helped me build the first draft of the landing page for LlamaResearcher!💖

If you're curious and you want to try LlamaResearcher, you can - completely for free and without subscription - for 30 days from now ➡️ https://llamaresearcher.com
And if you're like me, and you like getting your hands in code and build stuff on your own machine, I have good news: this is all open-source, fully reproducible locally and Docker-ready🐋
Just go to the GitHub repo: https://github.com/AstraBert/llama-4-researcher and don't forget to star it, if you find it useful!⭐

As always, have fun and feel free to leave your feedback✨

2 replies

·

reacted to jsulz's post with 🔥 3 months ago

Post

3794

Huge week for

xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.

Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.

We expect builders on the Hub to see even more improvements, helping power innovation across the community.

With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.

Thanks to the

meta-llama team for launching on Xet!

reacted to hesamation's post with ❤️ 3 months ago

Post

2737

What, How, Where, and How Well? This paper reviews test-time scaling methods and all you need to know about them:
> parallel, sequential, hybrid, internal scaling
> how to scale (SFT, RL, search, verification)
> metrics and evals of test-time scaling

🔗paper: What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models (2503.24235)

If you want to learn what inference-time compute scaling is @rasbt has a great blog post on that:
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling

replied to nyuuzyou's post 3 months ago

already seeing some purple on the hub : https://huggingface.co/Systran/faster-whisper-large-v3

reacted to AdinaY's post with 🔥 4 months ago

Post

2924

RWKV7-G1 0.1B 🔥 Pure RNN reasoning model released by RWKV

Model: BlinkDL/rwkv7-g1
paper: RWKV-7 "Goose" with Expressive Dynamic State Evolution (2503.14456)

✨ Apache2.0
✨ Supports 100+ languages
✨ 0.1 B runs smoothly on low power devices
✨ 0.4B/1.5B/2.9B are coming soon!!

1 reply

·

reacted to Jaward's post with 🚀 4 months ago

Post

2137

Nvidia brings blue (from starwars droids) to life 🤯, supercute with flawless dexterity and droid voice. It's the result of their colab research with Google DeepMind and Disney, revealed as part of their new opensource physics engine for robotics simulation: NEWTON - which enables robots to learn how to complete complex tasks with greater precision.

ReadMore: https://developer.nvidia.com/blog/announcing-newton-an-open-source-physics-engine-for-robotics-simulation?ncid=so-twit-820797-vt48

reacted to csabakecskemeti's post with 😎 4 months ago

Post

1831

GTC new model announcement now from Nvidia
nvidia/Llama-3_3-Nemotron-Super-49B-v1

GGUFs:
DevQuasar/nvidia.Llama-3_3-Nemotron-Super-49B-v1-GGUF

Enjoy!

reacted to m-ric's post with 🤗 4 months ago

Post

5148

smolagents now support vLLM! 🥳

As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.

Go try it and tell us what you think!

https://github.com/huggingface/smolagents/blob/45b2c86857b7f7657daaa74e4d17d347e9e2c4a4/src/smolagents/models.py#L497

replied to their post 4 months ago

Glad to be of help 🙌

reacted to clem's post with 🚀🤗 4 months ago

Post

4701

We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

3 replies

·

Hafedh Hichri

AI & ML interests

Recent Activity

Organizations

Hafedh Hichri

AI & ML interests

Recent Activity

Organizations

not-lain's activity