YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ“„ LayoutLMv3 Fine-Tuned on FUNSD for Key-Value Pair Extraction

Model Details

Developed by: nnul Model type: LayoutLMv3 (microsoft/layoutlmv3-base) Language(s): English License: Apache 2.0 Fine-tuned from: microsoft/layoutlmv3-base

This model is a fine-tuned version of LayoutLMv3 on the FUNSD dataset. It has been trained for the task of form understanding, specifically token classification for extracting structured information from scanned forms (e.g., questions and answers in a key-value format).


Model Description

The model performs token-level classification, labeling each token as one of:

  • QUESTION
  • ANSWER
  • HEADER
  • O (other)

It takes as input a scanned form image and its OCR-extracted tokens and bounding boxes.


Model Sources


Uses

Direct Use

  • Key-value pair extraction from scanned documents
  • Form understanding
  • Preprocessing step for document-based QA, autofill, or RPA systems

Downstream Use

  • Automating information extraction from forms
  • Fine-tuning on custom form datasets (insurance, tax, invoices, etc.)

Out-of-Scope Use

  • Documents not structured like forms
  • Non-English documents (was not trained on multilingual data)
  • Highly noisy OCR (e.g., handwriting)

Bias, Risks, and Limitations

  • Biased toward the structure and layout of FUNSD forms (U.S.-centric, clean typewritten documents).
  • May perform poorly on handwritten or low-quality scans.
  • Assumes accurate OCR input.

How to Get Started

from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
from PIL import Image

# Load model and processor
model = LayoutLMv3ForTokenClassification.from_pretrained("nnul/layoutlmv3-finetuned-funsd")
processor = LayoutLMv3Processor.from_pretrained("nnul/layoutlmv3-finetuned-funsd")

# Load and prepare image + OCR tokens and boxes
image = Image.open("your_form.jpg").convert("RGB")
words = ["Name", ":", "John", "Doe"]
boxes = [[100,100,150,120], [155,100,160,120], [165,100,220,120], [225,100,270,120]]

encoding = processor(image, words, boxes=boxes, return_tensors="pt")
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)

Training Details

Training Data

  • FUNSD Dataset
  • ~199 forms, annotated with token-level BIO labels

Training Hyperparameters

  • Epochs: 7
  • Learning rate: default
  • Batch size: 2
  • Optimizer: AdamW
  • Training time: ~5 minutes on A100 (Colab)

Evaluation

Label Precision Recall F1-Score Support
ANSWER 0.90 0.93 0.92 817
HEADER 0.67 0.64 0.66 119
QUESTION 0.91 0.94 0.93 1077
Micro Avg 0.90 0.92 0.91 2013
Macro Avg 0.83 0.84 0.83 2013
Weighted Avg 0.89 0.92 0.91 2013

Environmental Impact

Parameter Value
Hardware Used NVIDIA A100 GPU (Colab)
Training Time ~5 minutes
Cloud Provider Google Colab
Carbon Emitted Negligible

Citation

@misc{layoutlmv3-funsd,
  title={LayoutLMv3 Fine-tuned on FUNSD},
  author={nnul},
  year={2025},
  howpublished={\url{https://huggingface.co/your-username/layoutlmv3-finetuned-funsd}},
  note={Fine-tuned LayoutLMv3 for key-value extraction from forms}
}
Downloads last month
9
Safetensors
Model size
125M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support