|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- Qwen/Qwen2.5-VL-32B-Instruct |
|
--- |
|
# TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials |
|
|
|
Model trained from [GUI-Net Dataset](https://huggingface.co/datasets/Bofeee5675/GUI-Net-1M) |
|
|
|
See detail at our [Project Page](https://github.com/TongUI-agent/TongUI-agent) |
|
|
|
|
|
## Model Details |
|
|
|
The base model is `Qwen/Qwen2.5-VL-32B-Instruct`. We fine-tuned base model by Lora. |
|
|
|
**Note:** Due to large size of 32B model, we only release the LoRA part of this model. To merge the weights, use the following script: |
|
|
|
```python |
|
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration, AutoConfig, AutoModelForImageTextToText |
|
import torch |
|
from peft.peft_model import PeftModel |
|
|
|
def load_model_and_processor(model_path, precision="bf16", lora_path=None, merge_lora=True): |
|
""" |
|
Load the Qwen2.5-VL model and processor with optional LoRA weights. |
|
|
|
Args: |
|
args: Arguments containing: |
|
- model_path: Path to the base model |
|
- precision: Model precision ("fp16", "bf16", or "fp32") |
|
- lora_path: Path to LoRA weights (optional) |
|
- merge_lora: Boolean indicating whether to merge LoRA weights |
|
|
|
Returns: |
|
tuple: (processor, model) - The initialized processor and model |
|
""" |
|
# Initialize processor |
|
try: |
|
processor = AutoProcessor.from_pretrained( |
|
model_path |
|
) |
|
except Exception as e: |
|
print(f"Error loading processor: {e}") |
|
processor = None |
|
config = AutoConfig.from_pretrained(model_path) |
|
print(config) |
|
raise e |
|
# Initialize base model |
|
from transformers import Qwen2_5_VLForConditionalGeneration |
|
# Initialize base model |
|
model_cls = Qwen2_5_VLForConditionalGeneration |
|
model = model_cls.from_pretrained( |
|
model_path, |
|
device_map="auto", |
|
torch_dtype=torch.float16 if precision == "fp16" else torch.bfloat16 if precision == "bf16" else torch.float32, |
|
attn_implementation="flash_attention_2", |
|
) |
|
|
|
# Load LoRA weights if path is provided |
|
if lora_path is not None and len(lora_path) > 0: |
|
print(f"Loading LoRA weights from {lora_path}") |
|
model = PeftModel.from_pretrained(model, lora_path) |
|
|
|
if merge_lora: |
|
print("Merging LoRA weights into base model") |
|
model = model.merge_and_unload() |
|
|
|
model.eval() |
|
|
|
return processor, model |
|
``` |
|
|
|
`model_path` is the base model, and `lora_path` is where you download this repo. |