|
--- |
|
license: apache-2.0 |
|
base_model: PekingU/rtdetr_v2_r101vd |
|
tags: |
|
- object-detection |
|
- computer-vision |
|
- voucher-classification |
|
- rt-detr |
|
- rtdetrv2 |
|
datasets: |
|
- custom-voucher-dataset |
|
metrics: |
|
- map |
|
- map_50 |
|
- map_75 |
|
widget: |
|
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg |
|
example_title: Example Image |
|
--- |
|
|
|
# RT-DETRv2 Fine-tuned for Voucher Classification |
|
|
|
This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type**: Object Detection (RT-DETRv2) |
|
- **Base Model**: PekingU/rtdetr_v2_r101vd |
|
- **Task**: Multi-class voucher classification and detection |
|
- **Classes**: 3 classes |
|
- 0: digital (digital invoices) |
|
- 1: fisico (physical receipts on blank pages) |
|
- 2: tesoreria (small on-site payment receipts) |
|
|
|
### Training Details |
|
|
|
**Training Dataset:** |
|
- **Total Samples**: 1227 |
|
- **Class Distribution**: |
|
- **tesoreria** (id: 2): 405 samples (33.0%) |
|
- **fisico** (id: 1): 416 samples (33.9%) |
|
- **digital** (id: 0): 406 samples (33.1%) |
|
|
|
|
|
**Training Configuration:** |
|
- **Image Size**: 832x832 |
|
- **Batch Size**: 32 |
|
- **Learning Rate**: 3e-05 |
|
- **Weight Decay**: 0.01 |
|
- **Epochs**: 80 |
|
- **Validation Split**: 0.2 |
|
|
|
**Data Processing:** |
|
- Pre-augmented dataset used (no runtime augmentation) |
|
- External train/validation split (REQUIRED - use create_train_val_split.py) |
|
- Preprocessing: Resize + Normalization only |
|
|
|
### Performance Metrics |
|
|
|
**Metric Definitions:** |
|
|
|
- **mAP (mean Average Precision)**: Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better) |
|
- **mAP@50**: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location |
|
- **mAP@75**: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization |
|
- **IoU (Intersection over Union)**: Overlap between predicted and ground truth bounding boxes |
|
|
|
**Performance Ranges:** |
|
- 0.9+: Excellent |
|
- 0.8-0.9: Very Good |
|
- 0.7-0.8: Good |
|
- 0.5-0.7: Fair |
|
- <0.5: Poor (needs improvement) |
|
|
|
**Final Evaluation Results:** |
|
|
|
|
|
**Overall Detection Performance:** |
|
- **mAP**: 0.0000 |
|
- **mAP@50**: 0.0000 |
|
- **mAP@75**: 0.0000 |
|
|
|
**Per-Class Average Precision:** |
|
- **Digital invoices**: 0.0000 (needs improvement) |
|
- **Fisico receipts**: 0.0000 (needs improvement) |
|
- **Tesoreria receipts**: 0.0000 (needs improvement) |
|
|
|
**Model Confidence:** |
|
- **Digital invoices mean confidence**: 0.4346 (low) |
|
- **Fisico receipts mean confidence**: 0.0000 (low) |
|
- **Tesoreria receipts mean confidence**: 0.0000 (low) |
|
|
|
**Performance by Object Size:** |
|
- **Small objects**: -1.0000 |
|
- **Medium objects**: -1.0000 |
|
- **Large objects**: 0.0000 |
|
|
|
**Evaluation Dataset:** |
|
- **Digital invoices**: 53 samples (27.5%) |
|
- **Fisico receipts**: 127 samples (65.8%) |
|
- **Tesoreria receipts**: 13 samples (6.7%) |
|
- **Total evaluation samples**: 193 |
|
|
|
**Model Configuration:** |
|
- **Base model**: PekingU/rtdetr_v2_r101vd |
|
- **Architecture**: rtdetr_v2_r101vd |
|
- **Input resolution**: 832×832 pixels |
|
- **Training epochs**: 80 |
|
- **Batch size**: 32 |
|
|
|
**Training Hardware:** |
|
- **GPU**: NVIDIA H100 80GB HBM3 |
|
- **VRAM**: 79.2 GB |
|
- **RAM**: 235.9 GB |
|
- **GPU configuration**: H100 optimized |
|
|
|
**Training Time**: 39.6 minutes |
|
|
|
**Training Summary:** |
|
- **Final training loss**: 4.9881 |
|
- **Final learning rate**: 2.08e-08 |
|
|
|
|
|
### MLflow Tracking |
|
|
|
- **MLflow Run ID**: 1690d8d04ea74ca99f0fea73a8466f83 |
|
- **MLflow Experiment**: RT-DETRv2_Voucher_Classification |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForObjectDetection, AutoImageProcessor |
|
import torch |
|
from PIL import Image |
|
import numpy as np |
|
|
|
# Load model and processor |
|
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier") |
|
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier") |
|
|
|
# Load and preprocess image |
|
image = Image.open("path/to/your/voucher.jpg").convert("RGB") |
|
inputs = image_processor(images=image, return_tensors="pt") |
|
|
|
# Run inference |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Post-process results |
|
target_sizes = torch.tensor([image.size[::-1]]) # (height, width) |
|
results = image_processor.post_process_object_detection( |
|
outputs, |
|
target_sizes=target_sizes, |
|
threshold=0.5 |
|
)[0] |
|
|
|
# Print predictions |
|
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): |
|
print(f"Class: {model.config.id2label[label.item()]}") |
|
print(f"Confidence: {score.item():.3f}") |
|
print(f"BBox: {box.tolist()}") |
|
``` |
|
|
|
## Training Procedure |
|
|
|
The model was fine-tuned using the Hugging Face Transformers library with: |
|
- Pre-augmented dataset focusing on challenging cases |
|
- Format-specific augmentation strategies applied during data preparation |
|
- MLflow experiment tracking for reproducibility |
|
- External train/validation split REQUIRED for unbiased evaluation (no fallback to training data) |
|
|
|
## Limitations and Bias |
|
|
|
- Trained specifically on voucher/receipt images |
|
- Performance may vary on images significantly different from training distribution |
|
- Model optimized for 3-class voucher classification task |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@misc{rtdetr-v2-voucher-classifier, |
|
title={RT-DETRv2 Fine-tuned for Voucher Classification}, |
|
author={Your Name}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier} |
|
} |
|
``` |
|
|