Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
5
2
wuyuhan
yuhanwuuu
Follow
21world's profile picture
1 follower
Β·
2 following
AI & ML interests
None yet
Recent Activity
liked
a model
17 days ago
deepseek-ai/DeepSeek-R1-0528
reacted
to
merve
's
post
with π₯
18 days ago
what happened in open AI past week? so many vision LM & omni releases π₯ https://huggingface.co/collections/merve/releases-23-may-68343cb970bbc359f9b5fb05 multimodal π¬πΌοΈ > new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) π > ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM π¬ (OS) > Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance > MMaDa is a new 8B diffusion language model that can generate image and text LLMs > Mistral released Devstral, a 24B coding assistant (OS) π©π»βπ» > Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS) > NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model > sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS) > samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B) image generation π¨ > MTVCrafter is a new human motion animation generator
authored
a paper
3 months ago
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
View all activity
Organizations
Papers
1
arxiv:
2503.04872
models
0
None public yet
datasets
0
None public yet