Qwen Fine Tuning Results

models

This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct on the invoice_train dataset. It achieves the following results on the evaluation set:

Loss: 0.0481

Model description

he Qwen2 2B model has been fine-tuned on OCR-rich invoice data from the CORD-v2 dataset, allowing it to recognize both the content and layout of invoices effectively. The model outputs structured information directly, enabling downstream processing or integration into accounting systems.

For each invoice image, the model identifies and extracts the following fields:

Menu Items
Item Prices
Subtotal Price
Total Price
Tax Amount
Cash Given
Change Amount

More Info

Base Model: Qwen2 2B — a large language model fine-tuned for vision-language tasks.
Fine-Tuning: Supervised learning on OCR + structure pairs from the CORD-v2 dataset.
Input: OCR-annotated invoice image data from the CORD-v2 dataset.
Output: Structured extraction of key financial fields in JSON format.

Training and evaluation data

Training Set: 800 samples Used to fine-tune the Qwen2 2B model on learning to extract key invoice components from OCR-text and layout information.
Evaluation Set: 100 samples Used to assess the model’s ability to generalize and accurately extract fields such as menu items, prices, subtotal, tax, cash, and change from unseen invoices.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0779	0.5	100	0.0685
0.0647	1.0	200	0.0511
0.0292	1.5	300	0.0500
0.028	2.0	400	0.0449
0.013	2.5	500	0.0488
0.0116	3.0	600	0.0481

Framework versions

PEFT 0.14.0
Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Alawy21
/

Invoice_Extraction_Qwen2_2B_Finetuning