--- model-index: - name: layoutlmv3-base-finetuned-rvlcdip results: - task: type: document-image-classification name: document-image-classification dataset: name: rvl-cdip type: amazon-ocr metrics: - type: evaluation_loss value: 0.1856316477060318 name: Evaluation Loss - type: accuracy value: 0.9519237980949524 name: Evaluation Accuracy - type: weighted_f1 value: 0.9518911690649716 name: Evaluation Weighted F1 - type: micro_f1 value: 0.9519237980949524 name: Evaluation Micro F1 - type: macro_f1 value: 0.9518042570370386 name: Evaluation Macro F1 - type: weighted_recall value: 0.9519237980949524 name: Evaluation Weighted Recall - type: micro_recall value: 0.9519237980949524 name: Evaluation Micro Recall - type: macro_recall value: 0.9518171728908463 name: Evaluation Macro Recall - type: weighted_precision value: 0.9519094862975979 name: Evaluation Weighted Precision - type: micro_precision value: 0.9519237980949524 name: Evaluation Micro Precision - type: macro_precision value: 0.9518423447239385 name: Evaluation Macro Precision - type: runtime value: 514.7031 name: Evaluation Runtime (seconds) - type: samples_per_second value: 77.713 name: Evaluation Samples per Second - type: steps_per_second value: 1.214 name: Evaluation Steps per Second --- # layoutlmv3-base-finetuned-rvlcdip This model is a fine-tuned version of microsoft/layoutlmv3-base on the [RVL-CDIP dataset](https://adamharley.com/rvl-cdip/) processed using Amazon OCR. The following metrics were computed on the evaluation set after the final optimization step: * Evaluation Loss: 0.1856316477060318 * Evaluation Accuracy: 0.9519237980949524 * Evaluation Weighted F1: 0.9518911690649716 * Evaluation Micro F1: 0.9519237980949524 * Evaluation Macro F1: 0.9518042570370386 * Evaluation Weighted Recall: 0.9519237980949524 * Evaluation Micro Recall: 0.9519237980949524 * Evaluation Macro Recall: 0.9518171728908463 * Evaluation Weighted Precision: 0.9519094862975979 * Evaluation Micro Precision: 0.9519237980949524 * Evaluation Macro Precision: 0.9518423447239385 * Evaluation Runtime (seconds): 514.7031 * Evaluation Samples per Second: 77.713 * Evaluation Steps per Second: 1.214 ## Training logs See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok ### Training arguments The following arguments were provided to Trainer: - Output Directory: ./results - Maximum Steps: 20000 - Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size) - Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints) - Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default) - Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default) - Evaluation Strategy: steps - Evaluation Steps: 1000 - Evaluate on Start: True - Save Strategy: steps - Save Steps: 1000 - Save Total Limit: 5 - Learning Rate: 2e-5 - Load Best Model at End: True - Metric for Best Model: accuracy - Greater is Better: True - Report to: wandb (log to Weights & Biases) - Logging Steps: 1000 - Logging First Step: True - Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine') - FP16: True (due to CUDA memory constraints) - Dataloader Number of Workers: 4 (number of subprocesses to use for data loading) - DDP Find Unused Parameters: True ### Framework versions - Transformers 4.42.3 - Pytorch 2.2.0+cu121 - Datasets 2.14.0 - Tokenizers 0.19.1