rtdetr-v2-voucher-classifier / README.md

jnmrr

Upload RT-DETRv2 voucher classifier

fbe6e50 verified 4 months ago

preview code

raw

history blame

5.57 kB

metadata

license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
  - object-detection
  - computer-vision
  - voucher-classification
  - rt-detr
  - rtdetrv2
datasets:
  - custom-voucher-dataset
metrics:
  - map
  - map_50
  - map_75
widget:
  - src: >-
      https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
    example_title: Example Image

RT-DETRv2 Fine-tuned for Voucher Classification

This model is a fine-tuned version of PekingU/rtdetr_v2_r101vd for voucher classification and object detection.

Model Details

Model Description

Model Type: Object Detection (RT-DETRv2)
Base Model: PekingU/rtdetr_v2_r101vd
Task: Multi-class voucher classification and detection
Classes: 3 classes
- 0: digital (digital invoices)
- 1: fisico (physical receipts on blank pages)
- 2: tesoreria (small on-site payment receipts)

Training Details

Training Dataset:

Total Samples: 507
Class Distribution:
fisico (id: 1): 241 samples (47.5%)
digital (id: 0): 147 samples (29.0%)
tesoreria (id: 2): 119 samples (23.5%)

Training Configuration:

Image Size: 800x800
Batch Size: 24
Learning Rate: 1.5e-05
Weight Decay: 0.0001
Epochs: 2
Validation Split: 0.0

Data Processing:

Pre-augmented dataset used (no runtime augmentation)
External train/validation split (REQUIRED - use create_train_val_split.py)
Preprocessing: Resize + Normalization only

Performance Metrics

Metric Definitions:

mAP (mean Average Precision): Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better)
mAP@50: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location
mAP@75: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization
IoU (Intersection over Union): Overlap between predicted and ground truth bounding boxes

Performance Ranges:

0.9+: Excellent
0.8-0.9: Very Good
0.7-0.8: Good
0.5-0.7: Fair
<0.5: Poor (needs improvement)

Final Evaluation Results:

Overall Detection Performance:

mAP: 0.0000
mAP@50: 0.0000
mAP@75: 0.0000

Per-Class Average Precision:

Digital invoices: 0.0000 (needs improvement)
Fisico receipts: 0.0000 (needs improvement)
Tesoreria receipts: 0.0000 (needs improvement)

Model Confidence:

Digital invoices mean confidence: 0.7041 (moderate)
Fisico receipts mean confidence: 0.5998 (low)
Tesoreria receipts mean confidence: 0.5715 (low)

Performance by Object Size:

Small objects: 0.0000
Medium objects: -1.0000
Large objects: 0.0000

Evaluation Dataset:

Digital invoices: 157 samples (28.5%)
Fisico receipts: 261 samples (47.4%)
Tesoreria receipts: 133 samples (24.1%)
Total evaluation samples: 551

Model Configuration:

Base model: PekingU/rtdetr_v2_r101vd
Architecture: rtdetr_v2_r101vd
Input resolution: 800×800 pixels
Training epochs: 2
Batch size: 24

Training Hardware:

GPU: NVIDIA A100-SXM4-40GB
VRAM: 39.6 GB
RAM: 83.5 GB
GPU configuration: A100 optimized

Training Time: 0.6 minutes

Training Summary:

Final training loss: 1361.6241
Final learning rate: 1.43e-07

MLflow Tracking

MLflow Run ID: 0bf1954e36da45088455964384408885
MLflow Experiment: RT-DETRv2_Voucher_Classification

Usage

from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np

# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")

# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process results
target_sizes = torch.tensor([image.size[::-1]])  # (height, width)
results = image_processor.post_process_object_detection(
    outputs, 
    target_sizes=target_sizes, 
    threshold=0.5
)[0]

# Print predictions
class_names = ["digital", "fisico", "tesoreria"]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(f"Class: {class_names[label.item()]}")
    print(f"Confidence: {score.item():.3f}")
    print(f"BBox: {box.tolist()}")

Training Procedure

The model was fine-tuned using the Hugging Face Transformers library with:

Pre-augmented dataset focusing on challenging cases
Format-specific augmentation strategies applied during data preparation
MLflow experiment tracking for reproducibility
External train/validation split REQUIRED for unbiased evaluation (no fallback to training data)

Limitations and Bias

Trained specifically on voucher/receipt images
Performance may vary on images significantly different from training distribution
Model optimized for 3-class voucher classification task

Citation

If you use this model, please cite:

@misc{rtdetr-v2-voucher-classifier,
  title={RT-DETRv2 Fine-tuned for Voucher Classification},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}