metadata
library_name: transformers
tags:
- document-question-answering
- layoutlmv3
- ocr
- document-understanding
- paddleocr
- multilingual
- layout-aware
- lakshya-singh
license: apache-2.0
language:
- en
base_model:
- microsoft/layoutlmv3-base
datasets:
- nielsr/docvqa_1200_examples
Document QA Model
This is a fine-tuned document question-answering model based on layoutlmv3-base
. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.
Model Details
Model Description
- Model Name:
document-qa-model
- Base Model:
microsoft/layoutlmv3-base
- Fine-tuned by: Lakshya Singh (solo contributor)
- Languages: English, Spanish, French, German, Italian
- License: Apache-2.0 (inherited from base model)
- Intended Use: Extract answers to structured queries from scanned documents
- Not funded — this project was completed independently.
Model Sources
- Repository:
Github Link
- Trained on: Adapted version of
nielsr/docvqa_1200_examples
- Model metrics: See
Uses
Direct Use
This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout-aware understanding
Out-of-Scope Use
- Not suitable for conversational QA
- Not suitable for images with no OCR-processed text
Training Details
Dataset
The dataset consisted of:
- Images of utility bills and documents
- OCR data with bounding boxes (from PaddleOCR)
- Queries in English, Spanish, and Chinese
- Answer spans with match scores and positions
Training Procedure
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3-base
- Epochs: 4
- Learning rate schedule: Shown in image below
Training Metrics
Evaluation
Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth
Summary
The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms
How to Use
- Available on my
Github