Timothy Chan
Update README.md
a0216c6 verified
---
language:
- en
metrics:
- accuracy
pipeline_tag: image-text-to-text
base_model:
- naver-clova-ix/donut-base-finetuned-cord-v2
tags:
- logistics
- document-parsing
---
πŸ—οΈ This is a FYP project topic on document parsing of 🚚 logistics 🚚 shipping documents for system integration.
- https://huggingface.co/uartimcs/donut-booking-extract/blob/main/FYP.pdf
Latest update on the version of modules used to continue run the program because there is no recent update for the donut pretrained model.
**My use case:**
Extract common key datafields from shipping documents generated from ten different shipping lines.
**Repo & Datasets**
- donut.zip (Original Donut Repo + Labelled Booking Dummy Datasets with JSONL files + Config Files)
- sample-image-to-play.zip (Excess dummy datasets used to play and test the model)
https://huggingface.co/spaces/uartimcs/donut-booking-gradio
**Colab Notebooks**
- donut-booking-train.ipynb (Train the model in Colab using T4 TPU / A100 GPU environment)
- donut-booking-run.ipynb (Run the model in Colab using gradio using T4 TPU / A100 GPU environment)
**Size of dataset**
Follow the CORD-v2 dataset ratio:
- train: 800 (80 pics x 10 classes)
- validation: 100 (10 pics x 10 classes)
- test: 100 (10 pics x 10 classes)