Qwen 2.5 VL 3B - Invoice Fine-tuned
This model is a fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct specifically optimized for invoice data extraction tasks.
Model Details
- Base Model: Qwen/Qwen2.5-VL-3B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Specialization: Invoice and document data extraction
- Model Size: ~3B parameters
- Fine-tuned by: vrushankkk
Intended Use
This model is designed for:
- Invoice data extraction
- Document analysis
- Structured data extraction from images
- OCR and information extraction tasks
Usage
from transformers import Qwen2_5_VLProcessor, Qwen2_5_VLForConditionalGeneration
from PIL import Image
# Load model and processor
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"vrushankkk/qwen-2.5-3b-vl-fine-tuned",
torch_dtype="auto",
device_map="auto"
)
processor = Qwen2_5_VLProcessor.from_pretrained("vrushankkk/qwen-2.5-3b-vl-fine-tuned")
# Prepare your image and prompt
image = Image.open("your_invoice.jpg")
prompt = "Extract invoice data and return as JSON"
# Process inputs
inputs = processor(text=prompt, images=image, return_tensors="pt")
# Generate response
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
- Training Data: Invoice dataset with 3000 samples
- Fine-tuning Method: LoRA adapters
- Base Model: Qwen 2.5 VL 3B Instruct
Limitations
- Specialized for invoice/document extraction tasks
- Performance may vary on other vision-language tasks
- Requires good quality input images for optimal results
Model Card Authors
vrushankkk
- Downloads last month
- -
Model tree for vrushankkk/qwen-2.5-3b-vl-fine-tuned
Base model
Qwen/Qwen2.5-VL-3B-Instruct