LFM2-VL GUI Assistant

This model is a fine-tuned version of LiquidAI/LFM2-VL-450M on the maharshpatelx/realGUI-800K dataset.

Model Description

A vision-language model specialized in GUI understanding and automation tasks. The model can analyze screenshots and provide guidance on GUI interactions, element identification, and navigation tasks.

Training Details

Base Model: LiquidAI/LFM2-VL-450M
Dataset: maharshpatelx/realGUI-800K
Training Method: Supervised Fine-Tuning (SFT) with LoRA
LoRA Config: r=8, alpha=16, dropout=0.05

Usage

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image

# Load model and processor
processor = AutoProcessor.from_pretrained("maharshpatelx/lfm2-vl-gui-sft", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    "maharshpatelx/lfm2-vl-gui-sft", 
    device_map="auto", 
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Prepare input
image = Image.open("screenshot.png").convert('RGB')
conversation = [
    {"role": "system", "content": [
        {"type": "text", "text": "You are a GUI automation assistant specialized in understanding user interfaces and providing guidance on GUI interactions. Analyze the screenshot and provide accurate responses about GUI elements, actions, or navigation tasks."}
    ]},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "what is bbox location of 'google chrome'?"}
    ]}
]

# Generate response
inputs = processor.apply_chat_template(
    conversation, 
    add_generation_prompt=True, 
    return_tensors="pt",
    tokenize=True,
    return_dict=True
)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

Intended Use

This model is designed for:

GUI automation and testing
User interface analysis
Accessibility assistance
Educational purposes in HCI research

Limitations

Performance may vary on UI designs significantly different from training data
May not generalize well to non-English interfaces
Should not be used for malicious automation or unauthorized access

Downloads last month: 9

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for maharshpatelx/lfm2-vl-gui-sft

Base model

LiquidAI/LFM2-VL-450M

Finetuned

(14)

this model