File size: 5,517 Bytes
890e7ea e026564 890e7ea e026564 890e7ea e026564 890e7ea dd9f3b2 890e7ea dd9f3b2 890e7ea e026564 dd9f3b2 e026564 dd9f3b2 890e7ea 4025b4a 890e7ea 3f85937 890e7ea 06216a1 fbe6e50 5468fd0 fbe6e50 5468fd0 fbe6e50 991a653 f84854c fbe6e50 3a530fe 5468fd0 fbe6e50 5468fd0 fbe6e50 890e7ea d5038e0 e026564 890e7ea e026564 890e7ea 991a653 890e7ea 991a653 dd9f3b2 890e7ea 991a653 890e7ea 5468fd0 890e7ea 4025b4a 890e7ea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
- object-detection
- computer-vision
- voucher-classification
- rt-detr
- rtdetrv2
datasets:
- custom-voucher-dataset
metrics:
- map
- map_50
- map_75
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
example_title: Example Image
---
# RT-DETRv2 Fine-tuned for Voucher Classification
This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection.
## Model Details
### Model Description
- **Model Type**: Object Detection (RT-DETRv2)
- **Base Model**: PekingU/rtdetr_v2_r101vd
- **Task**: Multi-class voucher classification and detection
- **Classes**: 3 classes
- 0: digital (digital invoices)
- 1: fisico (physical receipts on blank pages)
- 2: tesoreria (small on-site payment receipts)
### Training Details
**Training Dataset:**
- **Total Samples**: 663
- **Class Distribution**:
- **fisico** (id: 1): 441 samples (66.5%)
- **digital** (id: 0): 177 samples (26.7%)
- **tesoreria** (id: 2): 45 samples (6.8%)
**Training Configuration:**
- **Image Size**: 832x832
- **Batch Size**: 32
- **Learning Rate**: 1e-05
- **Weight Decay**: 0.01
- **Epochs**: 80
- **Validation Split**: 0.15
**Data Processing:**
- Pre-augmented dataset used (no runtime augmentation)
- External train/validation split (REQUIRED - use create_train_val_split.py)
- Preprocessing: Resize + Normalization only
### Performance Metrics
**Metric Definitions:**
- **mAP (mean Average Precision)**: Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better)
- **mAP@50**: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location
- **mAP@75**: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization
- **IoU (Intersection over Union)**: Overlap between predicted and ground truth bounding boxes
**Performance Ranges:**
- 0.9+: Excellent
- 0.8-0.9: Very Good
- 0.7-0.8: Good
- 0.5-0.7: Fair
- <0.5: Poor (needs improvement)
**Final Evaluation Results:**
**Overall Detection Performance:**
- **mAP**: 0.0000
- **mAP@50**: 0.0000
- **mAP@75**: 0.0000
**Per-Class Average Precision:**
- **Digital invoices**: 0.0000 (needs improvement)
- **Fisico receipts**: 0.0000 (needs improvement)
- **Tesoreria receipts**: 0.0000 (needs improvement)
**Model Confidence:**
- **Digital invoices mean confidence**: 0.4218 (low)
- **Fisico receipts mean confidence**: 0.3837 (low)
- **Tesoreria receipts mean confidence**: 0.0000 (low)
**Performance by Object Size:**
- **Small objects**: -1.0000
- **Medium objects**: -1.0000
- **Large objects**: 0.0000
**Evaluation Dataset:**
- **Digital invoices**: 53 samples (27.5%)
- **Fisico receipts**: 127 samples (65.8%)
- **Tesoreria receipts**: 13 samples (6.7%)
- **Total evaluation samples**: 193
**Model Configuration:**
- **Base model**: PekingU/rtdetr_v2_r101vd
- **Architecture**: rtdetr_v2_r101vd
- **Input resolution**: 832×832 pixels
- **Training epochs**: 80
- **Batch size**: 32
**Training Hardware:**
- **GPU**: NVIDIA H100 80GB HBM3
- **VRAM**: 79.2 GB
- **RAM**: 235.9 GB
- **GPU configuration**: H100 optimized
**Training Time**: 27.0 minutes
**Training Summary:**
- **Final training loss**: 10.7460
- **Final learning rate**: 1.77e-11
### MLflow Tracking
- **MLflow Run ID**: 6b50f63a6e3144b7a719bbb2b15cb77a
- **MLflow Experiment**: RT-DETRv2_Voucher_Classification
## Usage
```python
from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np
# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process results
target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
results = image_processor.post_process_object_detection(
outputs,
target_sizes=target_sizes,
threshold=0.5
)[0]
# Print predictions
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
print(f"Class: {model.config.id2label[label.item()]}")
print(f"Confidence: {score.item():.3f}")
print(f"BBox: {box.tolist()}")
```
## Training Procedure
The model was fine-tuned using the Hugging Face Transformers library with:
- Pre-augmented dataset focusing on challenging cases
- Format-specific augmentation strategies applied during data preparation
- MLflow experiment tracking for reproducibility
- External train/validation split REQUIRED for unbiased evaluation (no fallback to training data)
## Limitations and Bias
- Trained specifically on voucher/receipt images
- Performance may vary on images significantly different from training distribution
- Model optimized for 3-class voucher classification task
## Citation
If you use this model, please cite:
```bibtex
@misc{rtdetr-v2-voucher-classifier,
title={RT-DETRv2 Fine-tuned for Voucher Classification},
author={Your Name},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}
```
|