LFM2-VL GUI Assistant
This model is a fine-tuned version of LiquidAI/LFM2-VL-450M on the maharshpatelx/realGUI-800K dataset.
Model Description
A vision-language model specialized in GUI understanding and automation tasks. The model can analyze screenshots and provide guidance on GUI interactions, element identification, and navigation tasks.
Training Details
- Base Model: LiquidAI/LFM2-VL-450M
- Dataset: maharshpatelx/realGUI-800K
- Training Method: Supervised Fine-Tuning (SFT) with LoRA
- LoRA Config: r=8, alpha=16, dropout=0.05
Usage
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
# Load model and processor
processor = AutoProcessor.from_pretrained("maharshpatelx/lfm2-vl-gui-sft", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
"maharshpatelx/lfm2-vl-gui-sft",
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True
)
# Prepare input
image = Image.open("screenshot.png").convert('RGB')
conversation = [
{"role": "system", "content": [
{"type": "text", "text": "You are a GUI automation assistant specialized in understanding user interfaces and providing guidance on GUI interactions. Analyze the screenshot and provide accurate responses about GUI elements, actions, or navigation tasks."}
]},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": "what is bbox location of 'google chrome'?"}
]}
]
# Generate response
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
return_dict=True
)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)
Intended Use
This model is designed for:
- GUI automation and testing
- User interface analysis
- Accessibility assistance
- Educational purposes in HCI research
Limitations
- Performance may vary on UI designs significantly different from training data
- May not generalize well to non-English interfaces
- Should not be used for malicious automation or unauthorized access
- Downloads last month
- 29
Model tree for maharshpatelx/lfm2-vl-gui-sft
Base model
LiquidAI/LFM2-VL-450M