A benchmark multimodal oro-dental dataset for large vision-language models Paper • 2511.04948 • Published Nov 7
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language Paper • 2505.10055 • Published May 15 • 1
Transformer-based Spatial Grounding: A Comprehensive Survey Paper • 2507.12739 • Published Jul 17