Inference Endpoints Images

community

https://endpoints.huggingface.co/

AI & ML interests

Hugging Face Inference Endpoints Images repository allows AI Builders to collaborate and engage creating awesome inference deployments

Recent Activity

cyrilzakka authored a paper 5 days ago

MammoGANesis: Controlled Generation of High-Resolution Mammograms for Radiology Education

cyrilzakka authored a paper 5 days ago

A Generalizable Deep Learning System for Cardiac MRI

cyrilzakka authored a paper 5 days ago

Med-Flamingo: a Multimodal Medical Few-shot Learner

View all activity

hfendpoints-images's activity

a-r-r-o-w

posted an update 1 day ago

Post

941

New diffusion model for text-to-image and video-to-world generation: Cosmos Predict-2 👽

Model collection: nvidia/cosmos-predict2-68028efc052239369a0f2959
Diffusers support: https://github.com/huggingface/diffusers/pull/11695
Documentation: https://huggingface.co/docs/diffusers/main/en/api/pipelines/cosmos

These are results with the 2B param model. Imagine what you could do with the 14B version! Go check it out now!

reach-vb

posted an update 2 days ago

Post

1488

Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🤯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! 💥

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!

a-r-r-o-w

posted an update 2 days ago

Post

1096

Did you know how simple it was to get started with your own custom compiler backend with torch.compile? What's stopping you from writing your own compiler?

import torch
from torch._functorch.partitioners import draw_graph

def compiler(fx_module: torch.fx.GraphModule, _):
    draw_graph(fx_module, f"compile.dot")
    return fx_module.forward

def capture(model, *inputs):
    compiled_model = torch.compile(model, backend=compiler)
    y = compiled_model(*inputs)
    y.sum().backward()

class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        
        self.linear_1 = torch.nn.Linear(16, 32)
        self.linear_2 = torch.nn.Linear(32, 16)
    
    def forward(self, x):
        x = self.linear_1(x)
        x = torch.nn.functional.silu(x)
        x = self.linear_2(x)
        return x

if __name__ == '__main__':
    model = MLP()
    model.to("mps")
    x = torch.randn(4, 16, device="mps", dtype=torch.float32)

    capture(model, x)

--------------

Part of https://huggingface.co/posts/a-r-r-o-w/231008365980283

1 reply

a-r-r-o-w

posted an update 3 days ago

Post

2030

Recently, I've been focusing my learning on the following topics:
- Pytorch internals, specifically the inductor system (roughly ~1 month of experience)
- Triton internals (~8 moe)
- CUDA (~3 moe)
- Understanding fusion patterns in compilers and how to improve them (~1 moe)
- Parallelism strategies for large scale inference optimization (~6-7 moe)

I thought it would be nice to document it somewhere for no particular reason. Maybe someone will find it useful? It's also because I want to get into the habit of writing, but had no motivation to do so. Maybe writing short informal posts will help build the habit.

Since I don't have a personal site, and don't plan to create one in the near future, I think HF posts are best suited for short and informal documentation to share my little discoveries and learnings. If you're interested, strap in!

First post in this series will be on basic study of Pytorch's float32 matmuls and their Triton implementation (nothing much, just the tutorial available on the website), short dive into TF32 and their TFLOPS comparison on an A100 machine.

3 replies

AdinaY

posted an update 5 days ago

Post

1493

Lingshu 🩺📖 medical MLLM released by DAMO Alibaba

lingshu-medical-mllm/lingshu-mllms-6847974ca5b5df750f017dad

✨ 7B/32B
✨ 12+ imaging modalities supported: X-Ray, CT, MRI, Microscopy +more
✨ Great performance on medical benchmark

cyrilzakka

authored 4 papers 5 days ago

MammoGANesis: Controlled Generation of High-Resolution Mammograms for Radiology Education

Paper • 2010.05177 • Published Oct 11, 2020

freddyaboulton

posted an update 5 days ago

Post

359

Time is running out! ⏰

Less than 24 hours to participate in the MCP Hackathon and win thousands of dollars in prizes! Don't miss this opportunity to showcase your skills.

Visit Agents-MCP-Hackathon/AI-Marketing-Content-Creator to register!

freddyaboulton

posted an update 5 days ago

Post

296

🚨 NotebookLM Dethroned?! 🚨

Meet Fluxions vui: The new open-source dialogue generation model.
🤯 100M Params, 40k hours audio!
🎙️ Multi-speaker audio
😂 Non-speech sounds (like [laughs]!)
📜 MIT License

Is this the future of content creation? Watch the video and decide for yourself!

https://huggingface.co/spaces/fluxions/vui-spacehttps://huggingface.co/fluxions/vui

1 reply

AdinaY

posted an update 6 days ago

Post

3129

RoboBrain 2.0🔥 OPEN embedded brain model by BAAIBeijing

BAAI/RoboBrain2.0-7B

✨ 7B - Apache 2.0 / 32B coming soon
✨ Supports multiple images, long videos, and high-resolution visuals
✨ Spatial + temporal reasoning
✨ Real-time memory & scene graphs

AdinaY

posted an update 8 days ago

Post

2651

RedNote 小红书 just released their first LLM 🔥

dots.llm1.base 🪐 a 142B MoE model with only 14B active params.

rednote-hilab/dotsllm1-68246aaaaba3363374a8aa7c
✨ Base & Instruct - MIT license
✨ Trained on 11.2T non-synthetic high-quality data
✨ Competitive with Qwen2.5/3 on reasoning, code, alignment

AdinaY

posted an update 8 days ago

Post

425

MiniCPM4🔥 efficient LLMs built for end-side devices, by OpenBMB

openbmb/minicpm4-6841ab29d180257e940baa9b

✨ Apache 2.0
✨ 5–7× Faster Inference (Jetson Orin & RTX 4090)
✨ 8B trained on 8T clean, non-synthetic tokens
✨ 32K Native Context -> 128K+ with InfLLM v2 + LongRoPE
✨ Runs on 🤗Transformers , http://CPM.cu, vLLM, and SGLang

jbilcke-hf

posted an update 8 days ago

Post

1917

Did you know that there is a UI wrapper around https://github.com/a-r-r-o-w/finetrainers which is a great library made by @a-r-r-o-w for finetuning AI video models?

The UI is called VideoModelStudio (or VMS in casual chat)

All you have to do is to duplicate this space:
jbilcke-hf/VideoModelStudio

jbilcke-hf

posted an update 8 days ago

Post

1769

Hi everyone,

I've seen some unsuccessful attempts at running Wan2GP inside a Hugging Face Space, which is a shame as it is a great Gradio app!

So here is a fork that you can use, with some instructions on how to do this:

jbilcke-hf/Wan2GP_you_must_clone_this_space_to_use_it#1

Note : some things like persistent models/storage/custom LoRAs might not be fully working out of the box. If you need those, you might have to dig into the Wan2GP codebase, see how to tweak the storage folder. Happy hacking!

mfuntowicz

updated a model 9 days ago

hfendpoints-images/embeddings-qwen3

Updated 9 days ago

mfuntowicz

updated a collection 9 days ago

Embeddings

Collection

Gather endpoints for embedding tasks • 2 items • Updated 9 days ago

mfuntowicz

published a model 9 days ago

hfendpoints-images/embeddings-qwen3

Updated 9 days ago

AdinaY

posted an update 9 days ago

Post

2006

New models from Qwen 🔥

Qwen3-Embedding and Qwen3-Reranker Series just released on the hub by
Alibaba Qwen team.

✨ 0.6B/ 4B/ 8B with Apache2.0
✨ Supports 119 languages 🤯
✨ Top-tier performance: Leading the MTEB multilingual leaderboard！

Reranker:
Qwen/qwen3-reranker-6841b22d0192d7ade9cdefea
Embedding:
Qwen/qwen3-embedding-6841b2055b99c44d9a4c371f

AI & ML interests

Recent Activity

Team members 39

hfendpoints-images's activity