IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face π₯
> not only a document converter but also can do document question answering, understand multiple languages π€― > best part: released with Apache 2.0 license π use it with your commercial projects! > it supports transformers, vLLM and MLX from the get-go! π€ > built on SigLIP2 & granite-165M
first vision language model built off openai/gpt-oss-20b just dropped! π₯
InternVL3.5 comes with 32 models π€― pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb comes with gpt-oss or Qwen3 for LLM part ‡οΈ
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! π€― Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! π
How does it work? π€ 1οΈβ£ Generate and cache image features for each frame 2οΈβ£ Create a list of embeddings for selected patch(es) 3οΈβ£ Compute cosine similarity between each patch and the selected patch(es) 4οΈβ£ Highlight those whose score is above some threshold
... et voilΓ ! π₯³
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! π€― π£οΈ Transcribe videos, meeting notes, songs and more π Runs on-device, meaning no data is sent to a server π Multilingual (8 languages) π€ Completely free (forever) & open source
That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! π₯