view article Article Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? By rohan598 and 4 others • Mar 5, 2024 • 4