---
model-index:
- name: layoutlmv3-base-finetuned-rvlcdip
  results:
  - task:
      type: document-image-classification
      name: document-image-classification
    dataset:
      name: rvl-cdip
      type: amazon-ocr
    metrics:
    - type: evaluation_loss
      value: 0.1856316477060318
      name: Evaluation Loss
    - type: accuracy
      value: 0.9519237980949524
      name: Evaluation Accuracy
    - type: weighted_f1
      value: 0.9518911690649716
      name: Evaluation Weighted F1
    - type: micro_f1
      value: 0.9519237980949524
      name: Evaluation Micro F1
    - type: macro_f1
      value: 0.9518042570370386
      name: Evaluation Macro F1
    - type: weighted_recall
      value: 0.9519237980949524
      name: Evaluation Weighted Recall
    - type: micro_recall
      value: 0.9519237980949524
      name: Evaluation Micro Recall
    - type: macro_recall
      value: 0.9518171728908463
      name: Evaluation Macro Recall
    - type: weighted_precision
      value: 0.9519094862975979
      name: Evaluation Weighted Precision
    - type: micro_precision
      value: 0.9519237980949524
      name: Evaluation Micro Precision
    - type: macro_precision
      value: 0.9518423447239385
      name: Evaluation Macro Precision
    - type: runtime
      value: 514.7031
      name: Evaluation Runtime (seconds)
    - type: samples_per_second
      value: 77.713
      name: Evaluation Samples per Second
    - type: steps_per_second
      value: 1.214
      name: Evaluation Steps per Second

---

# layoutlmv3-base-finetuned-rvlcdip

This model is a fine-tuned version of microsoft/layoutlmv3-base on the [RVL-CDIP dataset](https://adamharley.com/rvl-cdip/) processed using Amazon OCR. 
The following metrics were computed on the evaluation set after the final optimization step:

* Evaluation Loss: 0.1856316477060318
* Evaluation Accuracy: 0.9519237980949524
* Evaluation Weighted F1: 0.9518911690649716
* Evaluation Micro F1: 0.9519237980949524
* Evaluation Macro F1: 0.9518042570370386
* Evaluation Weighted Recall: 0.9519237980949524
* Evaluation Micro Recall: 0.9519237980949524
* Evaluation Macro Recall: 0.9518171728908463
* Evaluation Weighted Precision: 0.9519094862975979
* Evaluation Micro Precision: 0.9519237980949524
* Evaluation Macro Precision: 0.9518423447239385
* Evaluation Runtime (seconds): 514.7031
* Evaluation Samples per Second: 77.713
* Evaluation Steps per Second: 1.214

## Training logs

See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok

### Training arguments

The following arguments were provided to Trainer:
- Output Directory: ./results
- Maximum Steps: 20000
- Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
- Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
- Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
- Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
- Evaluation Strategy: steps
- Evaluation Steps: 1000
- Evaluate on Start: True
- Save Strategy: steps
- Save Steps: 1000
- Save Total Limit: 5
- Learning Rate: 2e-5
- Load Best Model at End: True
- Metric for Best Model: accuracy
- Greater is Better: True
- Report to: wandb (log to Weights & Biases)
- Logging Steps: 1000
- Logging First Step: True
- Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
- FP16: True (due to CUDA memory constraints)
- Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
- DDP Find Unused Parameters: True

### Framework versions

- Transformers 4.42.3
- Pytorch 2.2.0+cu121
- Datasets 2.14.0
- Tokenizers 0.19.1