view article Article Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes? By rohan598 and 4 others โข Mar 5, 2024 โข 4