Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding Paper • 2402.18262 • Published Feb 28, 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15, 2024 • 7
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 51
Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era Paper • 2305.08144 • Published May 14, 2023 • 1
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset Paper • 2305.15891 • Published May 25, 2023