Huggingface Projects

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

huggingface-projects's activity

merve 
posted an update 3 days ago
AdinaY 
posted an update 5 days ago
view post
Post
2669
DeepSeek, Alibaba, Skywork, Xiaomi, Bytedance.....
And that’s just part of the companies from the Chinese community that released open models in April 🤯

zh-ai-community/april-2025-open-releases-from-the-chinese-community-67ea699965f6e4c135cab10f

🎬 Video
> MAGI-1 by SandAI
> SkyReels-A2 & SkyReels-V2 by Skywork
> Wan2.1-FLF2V by Alibaba-Wan

🎨 Image
> HiDream-I1 by Vivago AI
> Kimi-VL by Moonshot AI
> InstantCharacter by InstantX & Tencent-Hunyuan
> Step1X-Edit by StepFun
> EasyControl by Shanghai Jiaotong University

🧠 Reasoning
> MiMo by Xiaomi
> Skywork-R1V 2.0 by Skywork
> ChatTS by ByteDance
> Kimina by Moonshot AI & Numina
> GLM-Z1 by Zhipu AI
> Skywork OR1 by Skywork
> Kimi-VL-Thinking by Moonshot AI

🔊 Audio
> Kimi-Audio by Moonshot AI
> IndexTTS by BiliBili
> MegaTTS3 by ByteDance
> Dolphin by DataOceanAI

🔢 Math
> DeepSeek Prover V2 by Deepseek

🌍 LLM
> Qwen by Alibaba-Qwen
> InternVL3 by Shanghai AI lab
> Ernie4.5 (demo) by Baidu

📊 Dataset
> PHYBench by Eureka-Lab
> ChildMandarin & Seniortalk by BAAI

Please feel free to add if I missed anything!
AdinaY 
posted an update 5 days ago
view post
Post
1754
Xiaomi just entered the open source as a new player🔥 And dropped MiMo - a 7B model trained from scratch for reasoning.

XiaomiMiMo/MiMo-7B-RL

✨ 7B - Base/RL/SFT/RL zero
✨ Surpasses 32B models in math & code
✨ Apache 2.0 licensed
AdinaY 
posted an update 5 days ago
merve 
posted an update 5 days ago
view post
Post
2484
Meta released Llama Guard 4 and new Prompt Guard 2 models 🔥

Llama Guard 4 is a new model to filter model inputs/outputs both text-only and image 🛡️ use it before and after LLMs/VLMs! meta-llama/Llama-Guard-4-12B

Prompt Guard 2 22M & 86M are smol models to prevent model jailbreaks and prompt injections ⚔ meta-llama/Llama-Prompt-Guard-2-22M meta-llama/Llama-Guard-4-12B
Both come with new release of transformers 🤗

Try the model right away 👉🏻https://github.com/huggingface/huggingface-llama-recipes/blob/main/llama_guard_4.ipynb

Read our blog to learn more and easily get started 👉🏻 https://huggingface.co/blog/llama-guard-4 🦙
  • 1 reply
·
Xenova 
posted an update 7 days ago
AdinaY 
posted an update 7 days ago
view post
Post
5026
Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
✨ 7B
✨ 13M+ hours of pretraining data
✨ Novel hybrid input architecture
✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
merve 
posted an update 10 days ago
view post
Post
3915
Don't sleep on new AI at Meta Vision-Language release! 🔥

facebook/perception-encoder-67f977c9a65ca5895a7f6ba1
facebook/perception-lm-67f9783f171948c383ee7498

Meta dropped swiss army knives for vision with A2.0 license 👏
> image/video encoders for vision language modelling and spatial understanding (object detection etc) 👏
> The vision LM outperforms InternVL3 and Qwen2.5VL 👏
> They also release gigantic video and image datasets

The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.

They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 👏



> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models 😮



> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)

The authors release the following checkpoints in sizes base, large and giant:

> 3 PE-Core checkpoints (224, 336, 448)
> 2 PE-Lang checkpoints (L, G)
> One PE-Spatial (G, 448)
> 3 PLM (1B, 3B, 8B)
> Datasets



Authors release following datasets 📑
> PE Video: Gigantic video datasete of 1M videos with 120k expert annotations ⏯️
> PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks
> PLM-VideoBench: New video benchmark on MCQA
  • 2 replies
·
victor 
posted an update 12 days ago
view post
Post
2968
DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B
AdinaY 
posted an update 12 days ago
view post
Post
3460
MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance
  • 1 reply
·
merve 
posted an update 12 days ago
view post
Post
3347
New foundation model on image and video captioning just dropped by NVIDIA AI 🔥

Describe Anything Model (DAM) is a 3B vision language model to generate detailed captions with localized references 😮

The team released the models, the dataset, a new benchmark and a demo 🤩 nvidia/describe-anything-680825bb8f5e41ff0785834c

Most of the vision LMs focus on image as a whole, lacking localized references in captions, and not taking in visual prompts (points, boxes, drawings around objects)

DAM addresses this on two levels: new vision backbone that takes in focal crops and the image itself, and a large scale dataset 👀

They generate a dataset by extending existing segmentation and referring expression generation datasets like REFCOCO, by passing in the images and classes to VLMs and generating captions.

Lastly, they also release a new benchmark again with self-supervision, they use an LLM to evaluate the detailed captions focusing on localization 👏
linoyts 
posted an update 13 days ago
AdinaY 
posted an update 13 days ago
AdinaY 
posted an update 14 days ago
AdinaY 
posted an update 18 days ago
view post
Post
2066
Wan2.1-FLF2V🎥 a 14B start-end frame video generation model just released by Alibaba_Wan🔥

Wan-AI/Wan2.1-FLF2V-14B-720P

✨ Give it two images (start & end), it generates a smooth, high-quality video in between.
✨ Apache 2.0 licensed
✨ Built on DiT + Flow Matching
  • 1 reply
·
giadap 
posted an update 18 days ago
view post
Post
1572
🤗 Just published: "Consent by Design" - exploring how we're building better consent mechanisms across the HF ecosystem!

Our research shows open AI development enables:
- Community-driven ethical standards
- Transparent accountability
- Context-specific implementations
- Privacy as core infrastructure

Check out our Space Privacy Analyzer tool that automatically generates privacy summaries of applications!

Effective consent isn't about perfect policies; it's about architectures that empower users while enabling innovation. 🚀

Read more: https://huggingface.co/blog/giadap/consent-by-design
Xenova 
posted an update 19 days ago
view post
Post
2529
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! 🤯

Well, with 🤗 Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! ⚡️

Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.

For now, try out the demo! 👇
webml-community/Zyphra-ZR1-WebGPU
  • 1 reply
·
AdinaY 
posted an update 20 days ago
view post
Post
892
After yesterday's wave of reveals, here's what's going down today in the Chinese AI community 🔥

✨ Kuaishou unveiled Kling AI 2.0
https://klingai.com/global/

✨ MiniMax AI dropped their latest TTS model Speech-02
https://minimax.io/audio

✨ Tencent Hunyuan teased the upcoming open model - Hunyuan Portrait
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation (2503.18860)

✨ ModelScope launched an MCP Square, with 1,500 MCPs already online
https://modelscope.cn/mcp

And it's only Tuesday🌞