Document QA Model
This is a fine-tuned document question-answering model based on layoutlmv3-base
. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.
Model Details
Model Description
- Model Name:
document-qa-model
- Base Model:
microsoft/layoutlmv3-base
- Fine-tuned by: Lakshya Singh (solo contributor)
- Languages: English, Spanish, French, German, Italian
- License: Apache-2.0 (inherited from base model)
- Intended Use: Extract answers to structured queries from scanned documents
- Not funded — this project was completed independently.
Model Sources
- Repository:
Github Link
- Trained on: Adapted version of
nielsr/docvqa_1200_examples
- Model metrics: See
Uses
Direct Use
This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout-aware understanding
Out-of-Scope Use
- Not suitable for conversational QA
- Not suitable for images with no OCR-processed text
Training Details
Dataset
The dataset consisted of:
- Images of utility bills and documents
- OCR data with bounding boxes (from PaddleOCR)
- Queries in English, Spanish, and Chinese
- Answer spans with match scores and positions
Training Procedure
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3-base
- Epochs: 4
- Learning rate schedule: Shown in image below
Training Metrics
Evaluation
Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth
Summary
The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms
How to Use
- Available on my
Github
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for lakshya-rawat/document-qa-model
Base model
microsoft/layoutlmv3-base