OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

ownerEli authored a paper about 16 hours ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

cyyang822 authored a paper about 16 hours ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Qingyun authored a paper about 16 hours ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

View all activity

cyyang822

authored a paper about 16 hours ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published 4 days ago • 24

Qingyun

authored a paper about 16 hours ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published 4 days ago • 24

heroding77

authored a paper about 16 hours ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published 4 days ago • 24

huiserwang

in OpenGVLab/MMBench-GUI 1 day ago

Enhance dataset card: Add metadata, paper abstract, and detailed information

#1 opened 1 day ago by

nielsr

numbmelon

updated a model 1 day ago

OpenGVLab/OpenCUA_Env

Updated 1 day ago

numbmelon

published a model 1 day ago

OpenGVLab/OpenCUA_Env

Updated 1 day ago

ownerEli

updated a model 1 day ago

OpenGVLab/OpenCUA_Env

Updated 1 day ago

awojustin

updated a dataset 1 day ago

OpenGVLab/LORIS

Updated 1 day ago • 167 • 3

prithivMLmods

posted an update 2 days ago

Post

4537

Explore OCR, Captioning, and Visual Understanding with Cutting-Edge Models on Hugging Face. 🤗🧪

I’ve put together a collection of Google Colab notebooks to experiment with some of the most exciting models available on the Hugging Face Hub focused on OCR, image captioning, and visual understanding tasks. [Image-to-Text] / [Image-Text-to-Text]

> 📖 OCR-ReportLab-Notebooks : prithivMLmods/OCR-ReportLab-Notebooks

These notebooks are built for quick prototyping and run on free T4 GPUs, making them perfect for experimentation, testing ideas, or just exploring what’s possible with modern vision-language models.

Note: The experimental notebooks are compiled with models that fit within the T4 GPU (free-tier) limits. More models along with their notebooks will be added over time.

prithivMLmods

posted an update 5 days ago

Post

2291

Excited to introduce the new experimental model "Qwen2.5-VL-7B-Abliterated-Caption-it", which is performing exceptionally well on image captioning tasks. This variant is specifically tailored for Abliterated Captioning and Uncensored Image Captioning. It is designed to generate highly detailed and descriptive captions across a broad range of visual categories including images with complex, sensitive, or nuanced content while handling varying aspect ratios and resolutions.🧪🤗

✨ Try the demo here : prithivMLmods/Qwen2.5-VL
✨ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
✨ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

ownerEli

updated a dataset 5 days ago

OpenGVLab/OpenCUA_Env

Updated 5 days ago • 8

ownerEli

published a dataset 5 days ago

OpenGVLab/OpenCUA_Env

Updated 5 days ago • 8

prithivMLmods

posted an update 6 days ago

Post

2324

olmOCR [Allen AI] just got an upgrade! 📈🧑‍🍳

The allenai/olmOCR-7B-0725 — fine-tuned with allenai/olmOCR-mix-0225 on top of Qwen/Qwen2.5-VL-7B-Instruct, pushing the boundaries of OCR technology. It takes a single document image as input, with the longest side resized to 1288 pixels. High-quality, openly available approach to parsing pdfs and other complex documents optical character recognition.

Try the demo here: prithivMLmods/Multimodal-OCR

✨ Model: allenai/olmOCR-7B-0725
✨ Model [fp8]: allenai/olmOCR-7B-0725-FP8
✨ Multimodal Implementations Space Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!