AI & ML interests

Collection of JS libraries to interact with the Hugging Face Hub

Recent Activity

merveΒ 
posted an update 12 days ago
view post
Post
5834
large AI labs open-sourced a ton of models last week πŸ”₯
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🀝
> IBM released a new Docling model with 258M params based on Granite (A2.0) πŸ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset πŸ’» OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash πŸ’­ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
Β·
merveΒ 
posted an update 17 days ago
view post
Post
3137
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face πŸ”₯

> not only a document converter but also can do document question answering, understand multiple languages 🀯
> best part: released with Apache 2.0 license πŸ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! πŸ€—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo πŸ’—
merveΒ 
posted an update 19 days ago
view post
Post
1059
a ton of image/video generation models and LLMs from big labs πŸ”₯

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use πŸ’¬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR πŸ“
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
merveΒ 
posted an update 23 days ago
view post
Post
896
fan-favorite vision LM Florence-2 is now officially supported in transformers πŸ€—

find all the models in florence-community org 🫑
merveΒ 
posted an update 25 days ago
merveΒ 
posted an update 26 days ago
merveΒ 
posted an update about 1 month ago
view post
Post
6229
large AI labs have dropped so many open models last week πŸ”₯ don't miss out on them

β†’ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
β†’ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
β†’ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
  • 1 reply
Β·
merveΒ 
posted an update about 1 month ago
view post
Post
6016
first vision language model built off openai/gpt-oss-20b just dropped! πŸ”₯

InternVL3.5 comes with 32 models 🀯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ‡️
  • 1 reply
Β·
XenovaΒ 
posted an update about 1 month ago
view post
Post
6178
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🀯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? πŸ€”
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilΓ ! πŸ₯³

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
  • 1 reply
Β·
merveΒ 
posted an update about 2 months ago
view post
Post
3282
GPT-4.1-mini level model right in your iPhone 🀯

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks πŸ”₯

allows commercial use as well!
XenovaΒ 
posted an update about 2 months ago
view post
Post
4176
The next generation of AI-powered websites is going to be WILD! 🀯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by πŸ€— Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! πŸš€
  • 2 replies
Β·
merveΒ 
posted an update 2 months ago
view post
Post
1160
we're all sleeping on this OCR model rednote-hilab/dots.ocr πŸ”₯

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🀯

single e2e model to extract image, convert tables, formula, and more into markdown πŸ“
try it MohamedRashad/Dots-OCR
merveΒ 
posted an update 2 months ago
view post
Post
682
massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection 🫑 merve/releases-august-2-6890c14248203522b7d0267f

LLMs πŸ’¬
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text β†’ image+text) (OS)
merveΒ 
posted an update 2 months ago
merveΒ 
posted an update 2 months ago
view post
Post
3641
past week in open AI was insane πŸ”₯ here's some of picks, find more here merve/releases-july-25-688768ca47fe3693407e02d1

πŸ’¬ LLMs & VLMs
> Qwen/Qwen3-235B-A22B-Thinking-2507 had a new update (OS)
> Qwen/Qwen3-Coder-480B-A35B-Instruct is out with 480B total 35B active params 🀯 (OS)
> AllenAI dropped an update to allenai/olmOCR-7B-0725 πŸ“
> InternLM released internlm/Intern-S1 - 235B Qwen3 MoE + 6B InternViT encoder (OS)
> OmniSVG/OmniSVG is a new SVG generation VLM (OS)

πŸ–ΌοΈ image/video/3D generation
> WanAI released Wan2.2 series - both T2V and I2V 14B models for high-quality video generation (OS) multimodalart/wan-22-688767e313337b434ed55112
> Tencent dropped tencent/HunyuanWorld-1 - image-to-3D scene generation model
  • 1 reply
Β·
merveΒ 
posted an update 2 months ago
view post
Post
4384
🀯 241B VLM with apache-2.0 license internlm/Intern-S1

internlm released Intern-S1: multimodal reasoning model based on 235B MoE Qwen3 and 6B InternViT 😍

benchmarks look great (πŸ‘‘ best model βœ… best open model)
XenovaΒ 
posted an update 2 months ago
view post
Post
3187
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🀯
πŸ—£οΈ Transcribe videos, meeting notes, songs and more
πŸ” Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
πŸ€— Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! πŸ”₯

Try it out yourself! πŸ‘‡
webml-community/Voxtral-WebGPU
merveΒ 
posted an update 2 months ago
view post
Post
828
so many open LLMs and image LoRAs dropped past week, here's some picks for you 🫑 merve/releases-july-18-687e3fbd2ab9b39c51f9238b

LLMs
> ByteDance released a bunch of translation models called Seed-X-RM (7B) ByteDance-Seed/Seed-X-RM-7B
> NVIDIA released reasoning models of which 32B surpassing the giant Qwen3-235B with cc-by-4.0 license πŸ‘ nvidia/openreasoning-nemotron-687730dae0170059860f1f01
> LG released a new EXAONE model (32B) LGAI-EXAONE/EXAONE-4.0-32B

VLMs/any-to-any
> vidore/colqwen-omni-v0.1 is a new any-to-any retriever (MIT)
> HiDream-ai/HiDream-E1-1 is image+text in image+text out model (MIT)

LoRAs
> There's a bunch of LoRAs based on Flux Kontext, gotta check out the collection 🀠
merveΒ 
posted an update 3 months ago