YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π LayoutLMv3 Fine-Tuned on FUNSD for Key-Value Pair Extraction
Model Details
Developed by: nnul
Model type: LayoutLMv3 (microsoft/layoutlmv3-base
)
Language(s): English
License: Apache 2.0
Fine-tuned from: microsoft/layoutlmv3-base
This model is a fine-tuned version of LayoutLMv3 on the FUNSD dataset. It has been trained for the task of form understanding, specifically token classification for extracting structured information from scanned forms (e.g., questions and answers in a key-value format).
Model Description
The model performs token-level classification, labeling each token as one of:
QUESTION
ANSWER
HEADER
O
(other)
It takes as input a scanned form image and its OCR-extracted tokens and bounding boxes.
Model Sources
- Dataset: nielsr/funsd-layoutlmv3
- Base model: microsoft/layoutlmv3-base
Uses
Direct Use
- Key-value pair extraction from scanned documents
- Form understanding
- Preprocessing step for document-based QA, autofill, or RPA systems
Downstream Use
- Automating information extraction from forms
- Fine-tuning on custom form datasets (insurance, tax, invoices, etc.)
Out-of-Scope Use
- Documents not structured like forms
- Non-English documents (was not trained on multilingual data)
- Highly noisy OCR (e.g., handwriting)
Bias, Risks, and Limitations
- Biased toward the structure and layout of FUNSD forms (U.S.-centric, clean typewritten documents).
- May perform poorly on handwritten or low-quality scans.
- Assumes accurate OCR input.
How to Get Started
from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
from PIL import Image
# Load model and processor
model = LayoutLMv3ForTokenClassification.from_pretrained("nnul/layoutlmv3-finetuned-funsd")
processor = LayoutLMv3Processor.from_pretrained("nnul/layoutlmv3-finetuned-funsd")
# Load and prepare image + OCR tokens and boxes
image = Image.open("your_form.jpg").convert("RGB")
words = ["Name", ":", "John", "Doe"]
boxes = [[100,100,150,120], [155,100,160,120], [165,100,220,120], [225,100,270,120]]
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)
Training Details
Training Data
- FUNSD Dataset
- ~199 forms, annotated with token-level BIO labels
Training Hyperparameters
- Epochs: 7
- Learning rate: default
- Batch size: 2
- Optimizer: AdamW
- Training time: ~5 minutes on A100 (Colab)
Evaluation
Label | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
ANSWER | 0.90 | 0.93 | 0.92 | 817 |
HEADER | 0.67 | 0.64 | 0.66 | 119 |
QUESTION | 0.91 | 0.94 | 0.93 | 1077 |
Micro Avg | 0.90 | 0.92 | 0.91 | 2013 |
Macro Avg | 0.83 | 0.84 | 0.83 | 2013 |
Weighted Avg | 0.89 | 0.92 | 0.91 | 2013 |
Environmental Impact
Parameter | Value |
---|---|
Hardware Used | NVIDIA A100 GPU (Colab) |
Training Time | ~5 minutes |
Cloud Provider | Google Colab |
Carbon Emitted | Negligible |
Citation
@misc{layoutlmv3-funsd,
title={LayoutLMv3 Fine-tuned on FUNSD},
author={nnul},
year={2025},
howpublished={\url{https://huggingface.co/your-username/layoutlmv3-finetuned-funsd}},
note={Fine-tuned LayoutLMv3 for key-value extraction from forms}
}
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support