---
library_name: transformers
tags:
- document-question-answering
- layoutlmv3
- ocr
- document-understanding
- paddleocr
- multilingual
- layout-aware
- lakshya-singh
license: apache-2.0
language:
- en
base_model:
- microsoft/layoutlmv3-base
datasets:
- nielsr/docvqa_1200_examples
---

# Document QA Model

This is a fine-tuned **document question-answering model** based on `layoutlmv3-base`. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.

---

## Model Details

### Model Description

- **Model Name:** `document-qa-model`
- **Base Model:** [`microsoft/layoutlmv3-base`](https://huggingface.co/microsoft/layoutlmv3-base)
- **Fine-tuned by:** Lakshya Singh (solo contributor)
- **Languages:** English, Spanish, French, German, Italian 
- **License:** Apache-2.0 (inherited from base model)
- **Intended Use:** Extract answers to structured queries from scanned documents
- **Not funded** — this project was completed independently.

---

## Model Sources

- **Repository:** [`Github Link`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)
- **Trained on:** Adapted version of [`nielsr/docvqa_1200_examples`](https://huggingface.co/datasets/nielsr/docvqa_1200_examples)
- **Model metrics:** See ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

---

## Uses

### Direct Use

This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout-aware understanding

### Out-of-Scope Use

- Not suitable for conversational QA
- Not suitable for images with no OCR-processed text

---

## Training Details

### Dataset

The dataset consisted of:
- **Images** of utility bills and documents
- **OCR data** with bounding boxes (from PaddleOCR)
- **Queries** in English, Spanish, and Chinese
- **Answer spans** with match scores and positions

### Training Procedure

- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3-base
- Epochs: 4
- Learning rate schedule: Shown in image below

### Training Metrics

- **F1 Score** (validation): ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
- **Loss & Learning Rate Chart**: ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

---

## Evaluation

### Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth

### Summary

The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms

---

## How to Use

- Available on my [`Github`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)