drogozhang commited on
Commit
e07a005
·
verified ·
1 Parent(s): f7f5f63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -3
README.md CHANGED
@@ -1,3 +1,113 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - multimodal
8
+ library_name: transformers
9
+ base_model:
10
+ - Qwen/Qwen2-VL-7B
11
+ ---
12
+
13
+ # WebDreamer: Model-Based Planning for Web Agents
14
+
15
+ WebDreamer is a planning framework that enables efficient and effective planning for real-world web agent tasks. Check our paper for more details.
16
+ This work is a collaboration between [OSUNLP](https://x.com/osunlp) and [Orby AI](https://www.orby.ai/).
17
+
18
+ ![image](https://github.com/user-attachments/assets/a1189fee-ff43-45fc-a818-3dc6befb6ad2)
19
+ - **Repository:** https://github.com/OSU-NLP-Group/WebDreamer
20
+ - **Paper:** https://arxiv.org/abs/2411.06559
21
+ - **Point of Contact:** [Kai Zhang](mailto:[email protected])
22
+
23
+
24
+ ## Models
25
+
26
+ - Dreamer-7B:
27
+ - [General](https://huggingface.co/osunlp/Dreamer-7B):
28
+ - [In-Domain-VWA-Shopping](https://huggingface.co/osunlp/Dreamer-7B-Shopping)
29
+ - [In-Domain-VWA-Classifieds](https://huggingface.co/osunlp/Dreamer-7B-Classifieds)
30
+ - [In-Domain-VWA-Reddit](https://huggingface.co/osunlp/Dreamer-7B-Reddit)
31
+ - [Training Data (Coming soon)]()
32
+
33
+ ## Results
34
+ ### Strong performance on VisualWebArena and Mind2Web-live
35
+ | Benchmark | Method | Success Rate |
36
+ |------------------|-----------------|--------------------|
37
+ | **VisualWebArena** | GPT-4o + Reactive | 17.6% |
38
+ | | GPT-4o + Tree Search | 26.2% |
39
+ | | **GPT-4o + WebDreamer** | 23.6% (↑34.1%) |
40
+ | **Online-Mind2Web** | GPT-4o + Reactive | 26.0% |
41
+ | | **GPT-4o + WebDreamer** | 37.0% (↑42.3%) |
42
+ | **Mind2Web-live** | GPT-4o + Reactive | 20.2% |
43
+ | | **GPT-4o + WebDreamer** | 25.0% (↑23.8%) |
44
+
45
+ Compared to the reactive baselines, WebDreamer significantly improves performance by 34.1%, 42.3%, and 23.8% on VisualWebArena, Online-Mind2Web, and Mind2Web-live, respectively.
46
+
47
+ ### Better efficiency than tree search with true interactions
48
+ <img width="1502" alt="image" src="https://github.com/user-attachments/assets/0afbc22d-b1eb-4026-a167-e1852cde7677">
49
+
50
+ WebDreamer effectively explores the search space through simulations, which largely reduces the reliance on real-world interactions while maintaining robust performance.
51
+
52
+ ## Inference
53
+
54
+ ### vLLM server
55
+
56
+ ```bash
57
+ vllm serve osunlp/Dreamer-7B --api-key token-abc123 --dtype float16
58
+ ```
59
+ or
60
+
61
+ ```bash
62
+ python -m vllm.entrypoints.openai.api_server --served-model-name osunlp/Dreamer-7B --model osunlp/Dreamer-7B --dtype float16
63
+ ```
64
+ You can find more instruction about training and inference in [Qwen2-VL's Official Repo](https://github.com/QwenLM/Qwen2-VL).
65
+
66
+ ### Prompt
67
+ Actually our model is quite robust to textual prompt so feel free to try various prompts which we didn't heavily explore.
68
+ ```python
69
+ def format_openai_template(description: str, base64_image):
70
+ return [
71
+ {
72
+ "role": "user",
73
+ "content": [
74
+ {
75
+ "type": "image_url",
76
+ "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
77
+ },
78
+ {
79
+ "type": "text",
80
+ "text": f"""
81
+ Below is current screenshot. Please describe what you would see after a {action_description}"""
82
+ },
83
+ ],
84
+ },
85
+ ]
86
+
87
+
88
+ messages = format_openai_template(description, base64_image)
89
+
90
+ completion = await client.chat.completions.create(
91
+ model=args.model_path,
92
+ messages=messages,
93
+ temperature=1.0
94
+ )
95
+
96
+ ```
97
+
98
+ ## Citation Information
99
+
100
+ If you find this work useful, please consider citing our papers:
101
+
102
+ ```
103
+ @article{Gu2024WebDreamer,
104
+ author = {Yu Gu and Kai Zhang and Yuting Ning and Boyuan Zheng and Boyu Gou and Tianci Xue and Cheng Chang and Sanjari Srivastava and Yanan Xie and Peng Qi and Huan Sun and Yu Su},
105
+ title = {Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents},
106
+ journal = {CoRR},
107
+ volume = {abs/2411.06559},
108
+ year = {2024},
109
+ url = {https://arxiv.org/abs/2411.06559},
110
+ eprinttype= {arXiv},
111
+ eprint = {2411.06559},
112
+ }
113
+ ```