Wanrong Zhu's picture

3 6 6

Wanrong Zhu

VegB

·

https://wanrong-zhu.com/

AI & ML interests

None yet

Organizations

authored a paper 6 months ago

Towards Visual Text Grounding of Multimodal Large Language Model

Paper • 2504.04974 • Published Apr 7 • 16

authored 2 papers over 1 year ago

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 28

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 18

authored a paper almost 2 years ago

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 15

authored 3 papers about 2 years ago

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Paper • 2304.06939 • Published Apr 14, 2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 33

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Paper • 2308.06595 • Published Aug 12, 2023 • 6