ll-re-hf (llrehf)

merve

posted an update 2 days ago

Post

414

Dolphin: new OCR model by ByteDance with MIT license 🐬

the model first detects element in the layout (table, formula etc) and then parses each element in parallel for generation
Model: ByteDance/Dolphin
Try the demo: ByteDance/Dolphin

reach-vb

posted an update 2 days ago

Post

1493

Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🤯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! 💥

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!

burtenshaw

posted an update 3 days ago

Post

1239

Brand new MCP Course has units are out, and now it's getting REAL! We've collaborated with Anthropic to dive deep into production ready and autonomous agents using MCP

🔗

mcp-course

This is what the new material covers and includes:

- Use Claude Code to build an autonomous PR agent
- Integrate your agent with Slack and Github to integrate it with you Team
- Get certified on your use case and share with the community
- Build an autonomous PR cleanup agent on the Hugging Face hub and deploy it with spaces

The material goes deep into these problems and helps you to build applications that work. We’re super excited to see what you build with it.

merve

posted an update 4 days ago

Post

1263

stop building parser pipelines 👋🏻
there's a new document parser that is small, fast, Apache 2.0 licensed and is better than all the other ones! 😱

echo840/MonkeyOCR is a 3B model that can parse everything (charts, formules, tables etc) in a document 🤠
> the authors show in the paper that document parsing pipelines often have errors propagating back
> using singular e2e models are better but they're too heavy to use

this model addresses both: it's lighter, faster, stronger 🔥

merve

posted an update 4 days ago

Post

1502

Meta just released V-JEPA 2: new open-source image/video world models ⏯️🤗 facebook/v-jepa-2-6841bad8413014e185b497a6

> based on ViT, different sizes (L/G/H) and resolution (286/384)
> 0-day support in 🤗 transformers
> comes with a physical reasoning (from video) benchmark: MVPBench, IntPhys 2, and CausalVQA facebook/physical_reasoning_leaderboard

Read more https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/
We will release a fine-tuning notebook with task-specific models in transformers format soon, stay tuned!

burtenshaw

posted an update 4 days ago

Post

1240

Super excited to release Autotrain MCP. This is an MCP server for training AI models, so you can use your AI tools to train your AI models 🤯.

🔗 burtenshaw/autotrain-mcp

Use this MCP server with tools like Claude Desktop, Cursor, VSCode, or Continue to do this:

- Define an ML problem like Image Classification, LLM fine-tuning, Text Classification, etc.
- The AI can retrieve models and datasets from the hub using the hub MCP.
- Training happens on a Hugging Face space, so no worries about hardware restraints.
- Models are pushed to the hub to be used inference tools like Llama.cpp, vLLM, MLX, etc.
- Built on top of the AutoTrain library, so it has full integration with transformers and other libraries.

Everything is still under active development, but I’m super excited to hear what people build, and I’m open to contributions!

1 reply

·

merve

posted an update 10 days ago

Post

2821

Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it 🥹
> KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B 🗣️
> Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive ⏯️ based on Qwen/Qwen2.5-Omni-7B

ariG23498

posted an update 11 days ago

Post

1337

🚨 Implement KV Cache from scratch in pure PyTorch. 🚨

We have documented all of our learning while implementing KV Cache to nanoVLM. Joint work with @kashif @lusxvr @andito @pcuenq

Blog: hf.co/blog/kv-cache

1 reply

·

merve

posted an update 11 days ago

Post

1506

Past week was insanely packed for open AI! 😱
Luckily we picked some highlights for you ❤️ lfg!

💬 LLMs/VLMs
> Deepseek 🐳 released deepseek-ai/DeepSeek-R1-0528, 38B model, only 0.2 and 1.4 points behind o3 in AIME 24/25 🤯 they also released an 8B distilled version based on Qwen3 (OS) deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
> Xiaomi released MiMo-7B-RL (LLM for code and math) and MiMo-VL-7B-RL (VLM for visual reasoning, GUI agentic task and general use) (OS) 😍 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
> NVIDIA released , new reasoning model nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
> DS: MiniMax released https://huggingface.co/MiniMaxAI/SynLogic, new 49k logical reasoning examples across 35 tasks including solving cipher, sudoku and more!

🖼️ Image/Video Generation
> tencent released tencent/HunyuanPortrait, a new model for consistent portrait generation with SVD Research license. They also released tencent/HunyuanVideo-Avatar, audio driven avatar generation (OS)
> showlab released showlab/OmniConsistency, consistent stylization model (OS)
> Rapidata/text-2-video-human-preferences-veo3 is a new T2V preference dataset based on videos from Veo3 with 46k examples (OS)

Audio🗣️
> https://huggingface.co/ResembleAI/Chatterbox is a new 500M text-to-speech model preferred more than ElevenLabs (OS) 😍
> PlayHT/PlayDiffusion is a new speech editing model (OS)

Other
> https://huggingface.co/NX-AI/TiReX is a new time series foundation model
> Yandex released a huge (4.79B examples!) video recommendation dataset https://huggingface.co/yandex/yambda

OS ones have Apache2.0 or MIT licenses, find more models and datasets here merve/releases-30-may-6840097345e0b1e915bff843

merve

posted an update 11 days ago

Post

1391

Yesterday was the day of vision language action models (VLAs)!

> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team 🤖
Blog: https://huggingface.co/blog/smolvla
Model: lerobot/smolvla_base

> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company 💻
Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd
Demo: https://huggingface.co/spaces/multimodalart/Holo1
Blog: https://huggingface.co/blog/Hcompany/holo1
super exciting times!!

merve

posted an update 12 days ago

Post

383

H Company released Holo-1: 3B and 7B GUI Action Vision Language Models for various web and computer agent tasks 🤗

Holo-1 has Apache 2.0 license and transformers support from day-0 🔥
> Read the blog: https://huggingface.co/blog/Hcompany/holo1
> Model repositories: Hcompany/holo1-683dd1eece7eb077b96d0cbd

merve

posted an update 13 days ago

Post

495

ColQwen2 just landed to transformers main 😍 vidore/colqwen2-v1.0-hf

use state-of-the-art visual document retrieval model ColQwen2 for your PDF retrieval or RAG pipelines 🎉

Here's a notebook to try right away: https://colab.research.google.com/drive/11_Vp6wB5RcQgK1MHt2M9On07EYXHH5E-?usp=sharing

merve

posted an update 14 days ago

Post

1132

New GUI model by Salesforce AI & Uni HK: Jedi
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p 🤗
Based on Qwen2.5-VL with Apache 2.0 license

prompt with below screenshot → select "find more"

3 replies

·

merve

posted an update 16 days ago

Post

1970

HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❤️‍🔥 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers 🔥

merve

posted an update 17 days ago

Post

2711

introducing: VLM vibe eval 🪭 visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval 💬

> compare different models with refreshed in-the-wild examples in different categories 🤠
> submit your favorite model for eval
no numbers -- just vibes!

merve

posted an update 19 days ago

Post

2546

emerging trend: models that can understand image + text and generate image + text

don't miss out ⤵️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱

I keep track of all any input → any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62

1 reply

·

merve

posted an update 20 days ago

Post

3126

what happened in open AI past week? so many vision LM & omni releases 🔥 merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal 💬🖼️
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text

LLMs
> Mistral released Devstral, a 24B coding assistant (OS) 👩🏻‍💻
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator

1 reply

·

merve

posted an update 24 days ago

Post

2595

Google released MedGemma on I/O'25 👏 google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗

they also released a cool demo for scan reading ➡️ google/rad_explain

use with transformers ⤵️

1 reply

·

burtenshaw

posted an update 24 days ago

Post

2532

MCP course is now LIVE! We just dropped quizzes, videos, and live streams to make it a fully interactive course:

🔗 join in now:

mcp-course

- It’s still free!
- Video 1 walks you through onboarding to the course
- The first live session is next week!
- You can now get a certificate via exam app
- We improved and written material with interactive quizzes

If you’re studying MCP and want a live, interactive, visual, certified course, then join us on the hub!

merve

posted an update 24 days ago

Post

3117

Bu post'u çevirebilirsiniz 🤗💗

6 replies

·

llrehf

AI & ML interests

Recent Activity

ll-re-hf's activity

AI & ML interests

Recent Activity

Team members 9

ll-re-hf's activity