Qwen 2.5 VL 3B - Invoice Fine-tuned

This model is a fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct specifically optimized for invoice data extraction tasks.

Model Details

  • Base Model: Qwen/Qwen2.5-VL-3B-Instruct
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Specialization: Invoice and document data extraction
  • Model Size: ~3B parameters
  • Fine-tuned by: vrushankkk

Intended Use

This model is designed for:

  • Invoice data extraction
  • Document analysis
  • Structured data extraction from images
  • OCR and information extraction tasks

Usage

from transformers import Qwen2_5_VLProcessor, Qwen2_5_VLForConditionalGeneration
from PIL import Image

# Load model and processor
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "vrushankkk/qwen-2.5-3b-vl-fine-tuned",
    torch_dtype="auto",
    device_map="auto"
)
processor = Qwen2_5_VLProcessor.from_pretrained("vrushankkk/qwen-2.5-3b-vl-fine-tuned")

# Prepare your image and prompt
image = Image.open("your_invoice.jpg")
prompt = "Extract invoice data and return as JSON"

# Process inputs
inputs = processor(text=prompt, images=image, return_tensors="pt")

# Generate response
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512)
    
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • Training Data: Invoice dataset with 3000 samples
  • Fine-tuning Method: LoRA adapters
  • Base Model: Qwen 2.5 VL 3B Instruct

Limitations

  • Specialized for invoice/document extraction tasks
  • Performance may vary on other vision-language tasks
  • Requires good quality input images for optimal results

Model Card Authors

vrushankkk

Downloads last month
-
Safetensors
Model size
3.75B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for vrushankkk/qwen-2.5-3b-vl-fine-tuned

Adapter
(53)
this model