Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published 4 days ago • 20
WebSailor: Navigating Super-human Reasoning for Web Agent Paper • 2507.02592 • Published 3 days ago • 74
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation Paper • 2506.21546 • Published 10 days ago • 2
view post Post 2922 Multimodal OCR with ReportLab? On Colab T4? (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B?) .. Yeah, it’s possible. I’ve made a dedicated Colab notebook to experiment with these models (all built on top of Qwen2.5 VL). 🤗🚀Download notebooks here : ✦︎ NanonetsOCR : https://colab.research.google.com/drive/1VvA-amvSVxGdWgIsh4_by6KWOtEs_Iqp✦︎ MonkeyOCR : https://colab.research.google.com/drive/1vPCojbmlXjDFUt06FJ1tjgnj_zWK4mUo✦︎ OCRFluxOCR : https://colab.research.google.com/drive/1TDoCXzWdF2hxVLbISqW6DjXAzOyI7pzf✦︎ TyphoonOCR : https://colab.research.google.com/drive/1_59zvLNnn1kvbiSFxzA1WiqhpbW8RKbz🜲 Github : https://github.com/PRITHIVSAKTHIUR/OCR-ReportLabWhat does it do?1. Performs OCR on the input image2. Generates a DOCX or PDF file with the input image and the extracted text...To know more about it, visit the model card of the respective model. !! See translation 🤗 3 3 ❤️ 3 3 🚀 2 2 🔥 2 2 👀 1 1 + Reply
Running on Zero MCP 8 8 Doc VLMs V2 Localization 🐪 camel-doc-ocr / vilasr-7b / ocrflux-3b / shotvl-7b
Running on Zero MCP 100 100 VisionScope-R2 🔍 behemoth-3b / skycaptioner /spacethinker / spaceom / coreocr