llrehf

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ll-re-hf's activity

merveΒ 
posted an update 2 days ago
view post
Post
414
Dolphin: new OCR model by ByteDance with MIT license 🐬

the model first detects element in the layout (table, formula etc) and then parses each element in parallel for generation
Model: ByteDance/Dolphin
Try the demo: ByteDance/Dolphin
reach-vbΒ 
posted an update 2 days ago
view post
Post
1493
Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🀯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! πŸ’₯

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!
burtenshawΒ 
posted an update 3 days ago
view post
Post
1239
Brand new MCP Course has units are out, and now it's getting REAL! We've collaborated with Anthropic to dive deep into production ready and autonomous agents using MCP

πŸ”— mcp-course

This is what the new material covers and includes:

- Use Claude Code to build an autonomous PR agent
- Integrate your agent with Slack and Github to integrate it with you Team
- Get certified on your use case and share with the community
- Build an autonomous PR cleanup agent on the Hugging Face hub and deploy it with spaces

The material goes deep into these problems and helps you to build applications that work. We’re super excited to see what you build with it.
merveΒ 
posted an update 4 days ago
view post
Post
1263
stop building parser pipelines πŸ‘‹πŸ»
there's a new document parser that is small, fast, Apache 2.0 licensed and is better than all the other ones! 😱

echo840/MonkeyOCR is a 3B model that can parse everything (charts, formules, tables etc) in a document 🀠
> the authors show in the paper that document parsing pipelines often have errors propagating back
> using singular e2e models are better but they're too heavy to use

this model addresses both: it's lighter, faster, stronger πŸ”₯
merveΒ 
posted an update 4 days ago
view post
Post
1502
Meta just released V-JEPA 2: new open-source image/video world models β―οΈπŸ€— facebook/v-jepa-2-6841bad8413014e185b497a6

> based on ViT, different sizes (L/G/H) and resolution (286/384)
> 0-day support in πŸ€— transformers
> comes with a physical reasoning (from video) benchmark: MVPBench, IntPhys 2, and CausalVQA facebook/physical_reasoning_leaderboard

Read more https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/
We will release a fine-tuning notebook with task-specific models in transformers format soon, stay tuned!
burtenshawΒ 
posted an update 4 days ago
view post
Post
1240
Super excited to release Autotrain MCP. This is an MCP server for training AI models, so you can use your AI tools to train your AI models 🀯.

πŸ”— burtenshaw/autotrain-mcp

Use this MCP server with tools like Claude Desktop, Cursor, VSCode, or Continue to do this:

- Define an ML problem like Image Classification, LLM fine-tuning, Text Classification, etc.
- The AI can retrieve models and datasets from the hub using the hub MCP.
- Training happens on a Hugging Face space, so no worries about hardware restraints.
- Models are pushed to the hub to be used inference tools like Llama.cpp, vLLM, MLX, etc.
- Built on top of the AutoTrain library, so it has full integration with transformers and other libraries.

Everything is still under active development, but I’m super excited to hear what people build, and I’m open to contributions!
  • 1 reply
Β·
merveΒ 
posted an update 10 days ago
ariG23498Β 
posted an update 11 days ago
view post
Post
1337
🚨 Implement KV Cache from scratch in pure PyTorch. 🚨

We have documented all of our learning while implementing KV Cache to nanoVLM. Joint work with @kashif @lusxvr @andito @pcuenq

Blog: hf.co/blog/kv-cache
  • 1 reply
Β·
merveΒ 
posted an update 11 days ago
view post
Post
1506
Past week was insanely packed for open AI! 😱
Luckily we picked some highlights for you ❀️ lfg!

πŸ’¬ LLMs/VLMs
> Deepseek 🐳 released deepseek-ai/DeepSeek-R1-0528, 38B model, only 0.2 and 1.4 points behind o3 in AIME 24/25 🀯 they also released an 8B distilled version based on Qwen3 (OS) deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
> Xiaomi released MiMo-7B-RL (LLM for code and math) and MiMo-VL-7B-RL (VLM for visual reasoning, GUI agentic task and general use) (OS) 😍 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
> NVIDIA released , new reasoning model nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
> DS: MiniMax released https://huggingface.co/MiniMaxAI/SynLogic, new 49k logical reasoning examples across 35 tasks including solving cipher, sudoku and more!

πŸ–ΌοΈ Image/Video Generation
> tencent released tencent/HunyuanPortrait, a new model for consistent portrait generation with SVD Research license. They also released tencent/HunyuanVideo-Avatar, audio driven avatar generation (OS)
> showlab released showlab/OmniConsistency, consistent stylization model (OS)
> Rapidata/text-2-video-human-preferences-veo3 is a new T2V preference dataset based on videos from Veo3 with 46k examples (OS)

AudioπŸ—£οΈ
> https://huggingface.co/ResembleAI/Chatterbox is a new 500M text-to-speech model preferred more than ElevenLabs (OS) 😍
> PlayHT/PlayDiffusion is a new speech editing model (OS)

Other
> https://huggingface.co/NX-AI/TiReX is a new time series foundation model
> Yandex released a huge (4.79B examples!) video recommendation dataset https://huggingface.co/yandex/yambda

OS ones have Apache2.0 or MIT licenses, find more models and datasets here merve/releases-30-may-6840097345e0b1e915bff843
merveΒ 
posted an update 11 days ago
view post
Post
1391
Yesterday was the day of vision language action models (VLAs)!

> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team πŸ€–
Blog: https://huggingface.co/blog/smolvla
Model: lerobot/smolvla_base

> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company πŸ’»
Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd
Demo: https://huggingface.co/spaces/multimodalart/Holo1
Blog: https://huggingface.co/blog/Hcompany/holo1
super exciting times!!
merveΒ 
posted an update 12 days ago
merveΒ 
posted an update 13 days ago
merveΒ 
posted an update 14 days ago
view post
Post
1132
New GUI model by Salesforce AI & Uni HK: Jedi
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p πŸ€—
Based on Qwen2.5-VL with Apache 2.0 license

prompt with below screenshot β†’ select "find more"
  • 3 replies
Β·
merveΒ 
posted an update 16 days ago
view post
Post
1970
HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❀️‍πŸ”₯ XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers πŸ”₯
merveΒ 
posted an update 17 days ago
view post
Post
2711
introducing: VLM vibe eval πŸͺ­ visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval πŸ’¬

> compare different models with refreshed in-the-wild examples in different categories 🀠
> submit your favorite model for eval
no numbers -- just vibes!
merveΒ 
posted an update 19 days ago
view post
Post
2546
emerging trend: models that can understand image + text and generate image + text

don't miss out ‡️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱

I keep track of all any input β†’ any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62
  • 1 reply
Β·
merveΒ 
posted an update 20 days ago
view post
Post
3126
what happened in open AI past week? so many vision LM & omni releases πŸ”₯ merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal πŸ’¬πŸ–ΌοΈ
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text



LLMs
> Mistral released Devstral, a 24B coding assistant (OS) πŸ‘©πŸ»β€πŸ’»
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator
  • 1 reply
Β·
merveΒ 
posted an update 24 days ago
view post
Post
2595
Google released MedGemma on I/O'25 πŸ‘ google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go πŸ€—

they also released a cool demo for scan reading ➑️ google/rad_explain

use with transformers ‡️
  • 1 reply
Β·
burtenshawΒ 
posted an update 24 days ago
view post
Post
2532
MCP course is now LIVE! We just dropped quizzes, videos, and live streams to make it a fully interactive course:

πŸ”— join in now: mcp-course

- It’s still free!
- Video 1 walks you through onboarding to the course
- The first live session is next week!
- You can now get a certificate via exam app
- We improved and written material with interactive quizzes

If you’re studying MCP and want a live, interactive, visual, certified course, then join us on the hub!
merveΒ 
posted an update 24 days ago
view post
Post
3117
Bu post'u Γ§evirebilirsiniz πŸ€—πŸ’—
Β·