Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, it’s possible. I’ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). 🤗🚀
The bunch of comparable demos for Multimodal VLMs (excels in OCR, cinematography understanding, spatial reasoning, etc.) now up on the Hub 🤗 — max recent till Jun'25.
Inference for generative ai models looks like a mine field, but there’s a simple protocol for picking the best inference:
🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index
👷 fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/
The demo for Camel-Doc-OCR-062825 (exp) is optimized for document retrieval and direct Markdown (.md) generation from images and PDFs. Additional demos include OCRFlux-3B (document OCR), VilaSR (spatial reasoning with visual drawing), and ShotVL (cinematic language understanding). 🐪
The community GPU grant was given by Hugging Face — special thanks to them. This space supports the following tasks: (image inference, video inference) with result markdown canvas and object detection/localization. 🤗🚀
. . . To know more about it, visit the model card of the respective model. !!
This is what efficient AI looks like: Gemma 3n just dropped - a natively multimodal model that runs entirely on your device. No cloud. No API calls.
🧠 Text, image, audio, and video - handled locally. ⚡️Only needs 2B in GPU memory to run 🤯 First sub-10B model to hit 1300+ Elo ✅ Plug-and-play with Hugging Face, MLX, llama.cpp, and more.
Plus: Multilingual out of the box (140+ languages), fine-tune in a free Colab notebook.
The demo for DREX-062225-exp (Document Retrieval and Extraction eXpert ~ experimental) / typhoon-ocr-3b (a bilingual document parsing model built specifically for real-world documents) / VIREX-062225-exp (Video Information Retrieval and Extraction eXpert ~ experimental) / olmOCR-7B-0225-preview (the document parsing model based on Qwen2VL). 🤗
Updated the docscopeOCR-7B-050425-exp with the DREX-062225-exp, with improved preciseness in table structure and line spacing in the markdown used on the document page. And though this is still an experimental one, it's expected to perform well in the defined DREX use cases [ Document Retrieval and Extraction eXpert – experimental ocr ]. 💻
The demo for smoldocling / nanonets ocr / typhoon ocr / monkey ocr explores the document OCR capabilities of various newly released multimodal VLMs in a single space. And if you're experiencing or demoing long document image OCR, kindly use the Smoldocling 256M preview [ Smoldocling is back in demo here. ] 🤗.
🌐 Clinical Trials Dataset now available on Hugging Face! 🧬
I’ve just released a comprehensive, ML-ready dataset featuring 500,000+ clinical trial records sourced directly from ClinicalTrials.gov for biomedical NLP, healthcare analytics, and clinical research applications 🤗
I wanted to produce the most complete and up-to-date dump with all raw data partially flattened to simplify extraction, self-querying and processing.
Do you have any ideas about what we can do with it? Using descriptions to enhance specialized embedding models?
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.
Extending datasets just got a whole lot easier! 🚀 With Sheets, I was able to create a Spanish version of the popular fka/awesome-chatgpt-prompts dataset in just a few minutes ⏱️.
Want to try it out for yourself? Head over to the Sheets space and see how easy it is to extend and modify existing datasets 🤯. The possibilities are endless! 🌐