Image classification using fine-tuned ViT - for historical :bowtie: documents sorting

Goal: solve a task of archive page images sorting (for their further content-based processing)

Scope: Processing of images, training and evaluation of ViT model, input file/directory processing, class 🏷️ (category) results of top N predictions output, predictions summarizing into a tabular format, HF 😊 hub support for the model

Model description 📇

🔲 Fine-tuned model repository: vit-historical-page ^1 🔗

🔳 Base model repository: google's vit-base-patch16-224 ^2 🔗

Data 📜

Training set of the model: 8950 images

Categories 🏷️

Label️	Ratio	Description
DRAW	11.89%	📈 - drawings, maps, paintings with text
DRAW_L	8.17%	📈📏 - drawings ... with a table legend or inside tabular layout / forms
LINE_HW	5.99%	✏️📏 - handwritten text lines inside tabular layout / forms
LINE_P	6.06%	📏 - printed text lines inside tabular layout / forms
LINE_T	13.39%	📏 - machine typed text lines inside tabular layout / forms
PHOTO	10.21%	🌄 - photos with text
PHOTO_L	7.86%	🌄📏 - photos inside tabular layout / forms or with a tabular annotation
TEXT	8.58%	📰 - mixed types of printed and handwritten texts
TEXT_HW	7.36%	✏️📄 - only handwritten text
TEXT_P	6.95%	📄 - only printed text
TEXT_T	13.53%	📄 - only machine typed text

Evaluation set (same proportions): 995 images

Data preprocessing

During training the following transforms were applied randomly with a 50% chance:

transforms.ColorJitter(brightness 0.5)
transforms.ColorJitter(contrast 0.5)
transforms.ColorJitter(saturation 0.5)
transforms.ColorJitter(hue 0.5)
transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5)))
transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2))))

Training Hyperparameters

eval_strategy "epoch"
save_strategy "epoch"
learning_rate 5e-5
per_device_train_batch_size 8
per_device_eval_batch_size 8
num_train_epochs 3
warmup_ratio 0.1
logging_steps 10
load_best_model_at_end True
metric_for_best_model "accuracy"

Results 📊

Evaluation set's accuracy (Top-3): 99.6%

Evaluation set's accuracy (Top-1): 97.3%

Result tables

Manually ✍ checked evaluation dataset results (TOP-3): model_TOP-3_EVAL.csv 🔗
Manually ✍ checked evaluation dataset results (TOP-1): model_TOP-1_EVAL.csv 🔗

Table columns

FILE - name of the file
PAGE - number of the page
CLASS-N - label of the category 🏷️, guess TOP-N
SCORE-N - score of the category 🏷️, guess TOP-N
TRUE - actual label of the category 🏷️

Contacts 📧

For support write to 📧 [email protected] 📧

Official repository: UFAL ^3

Acknowledgements 🙏

Developed by UFAL ^5 👥
Funded by ATRIUM ^4 💰
Shared by ATRIUM ^4 & UFAL ^5
Model type: fine-tuned ViT ^2 with a 224x224 resolution size

k4tel
/

vit-historical-page