LFM2-VL GUI Assistant

This model is a fine-tuned version of LiquidAI/LFM2-VL-450M on the maharshpatelx/realGUI-800K dataset.

Model Description

A vision-language model specialized in GUI understanding and automation tasks. The model can analyze screenshots and provide guidance on GUI interactions, element identification, and navigation tasks.

Training Details

  • Base Model: LiquidAI/LFM2-VL-450M
  • Dataset: maharshpatelx/realGUI-800K
  • Training Method: Supervised Fine-Tuning (SFT) with LoRA
  • LoRA Config: r=8, alpha=16, dropout=0.05

Usage

import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image

# Load model and processor
processor = AutoProcessor.from_pretrained("maharshpatelx/lfm2-vl-gui-sft", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    "maharshpatelx/lfm2-vl-gui-sft", 
    device_map="auto", 
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Prepare input
image = Image.open("screenshot.png").convert('RGB')
conversation = [
    {"role": "system", "content": [
        {"type": "text", "text": "You are a GUI automation assistant specialized in understanding user interfaces and providing guidance on GUI interactions. Analyze the screenshot and provide accurate responses about GUI elements, actions, or navigation tasks."}
    ]},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "what is bbox location of 'google chrome'?"}
    ]}
]

# Generate response
inputs = processor.apply_chat_template(
    conversation, 
    add_generation_prompt=True, 
    return_tensors="pt",
    tokenize=True,
    return_dict=True
)
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

Intended Use

This model is designed for:

  • GUI automation and testing
  • User interface analysis
  • Accessibility assistance
  • Educational purposes in HCI research

Limitations

  • Performance may vary on UI designs significantly different from training data
  • May not generalize well to non-English interfaces
  • Should not be used for malicious automation or unauthorized access
Downloads last month
29
Safetensors
Model size
451M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for maharshpatelx/lfm2-vl-gui-sft

Finetuned
(9)
this model