document-qa-model / README.md
lakshya-rawat's picture
Update README.md
da54f70 verified
metadata
library_name: transformers
tags:
  - document-question-answering
  - layoutlmv3
  - ocr
  - document-understanding
  - paddleocr
  - multilingual
  - layout-aware
  - lakshya-singh
license: apache-2.0
language:
  - en
base_model:
  - microsoft/layoutlmv3-base
datasets:
  - nielsr/docvqa_1200_examples

Document QA Model

This is a fine-tuned document question-answering model based on layoutlmv3-base. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.


Model Details

Model Description

  • Model Name: document-qa-model
  • Base Model: microsoft/layoutlmv3-base
  • Fine-tuned by: Lakshya Singh (solo contributor)
  • Languages: English, Spanish, French, German, Italian
  • License: Apache-2.0 (inherited from base model)
  • Intended Use: Extract answers to structured queries from scanned documents
  • Not funded — this project was completed independently.

Model Sources


Uses

Direct Use

This model can be used for:

  • Question Answering on document images (PDFs, invoices, utility bills)
  • Information extraction tasks using OCR and layout-aware understanding

Out-of-Scope Use

  • Not suitable for conversational QA
  • Not suitable for images with no OCR-processed text

Training Details

Dataset

The dataset consisted of:

  • Images of utility bills and documents
  • OCR data with bounding boxes (from PaddleOCR)
  • Queries in English, Spanish, and Chinese
  • Answer spans with match scores and positions

Training Procedure

  • Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
  • Model: LayoutLMv3-base
  • Epochs: 4
  • Learning rate schedule: Shown in image below

Training Metrics

  • F1 Score (validation): training_history.png
  • Loss & Learning Rate Chart: training_history.png

Evaluation

Metrics Used

  • F1 score
  • Match score of predicted spans
  • Token overlap vs ground truth

Summary

The model performs well on document-style QA tasks, especially with:

  • Clearly structured OCR results
  • Document types similar to utility bills, invoices, and forms

How to Use