Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Wanrong Zhu's picture
3 6 6

Wanrong Zhu

VegB
·
https://wanrong-zhu.com/

AI & ML interests

None yet

Organizations

openflamingo's profile picture

authored a paper 6 months ago

Towards Visual Text Grounding of Multimodal Large Language Model

Paper • 2504.04974 • Published Apr 7 • 16
authored 2 papers over 1 year ago

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 28

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 18
authored a paper almost 2 years ago

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 15
authored 3 papers about 2 years ago

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Paper • 2304.06939 • Published Apr 14, 2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 33

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Paper • 2308.06595 • Published Aug 12, 2023 • 6
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs