File size: 5,517 Bytes
890e7ea
 
e026564
890e7ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e026564
890e7ea
 
 
 
 
e026564
890e7ea
 
 
 
 
 
 
 
 
dd9f3b2
890e7ea
dd9f3b2
 
 
890e7ea
 
 
e026564
 
dd9f3b2
e026564
 
dd9f3b2
890e7ea
 
 
4025b4a
890e7ea
 
 
 
3f85937
 
 
 
 
 
 
 
 
 
 
 
 
 
890e7ea
06216a1
fbe6e50
 
5468fd0
 
 
fbe6e50
 
 
5468fd0
fbe6e50
 
 
991a653
 
f84854c
fbe6e50
 
3a530fe
5468fd0
 
fbe6e50
 
5468fd0
 
 
 
fbe6e50
890e7ea
 
 
d5038e0
e026564
 
890e7ea
 
e026564
 
 
 
890e7ea
991a653
890e7ea
 
991a653
dd9f3b2
890e7ea
 
 
 
991a653
890e7ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5468fd0
890e7ea
 
 
 
 
 
 
 
 
 
4025b4a
890e7ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
- object-detection
- computer-vision
- voucher-classification
- rt-detr
- rtdetrv2
datasets:
- custom-voucher-dataset
metrics:
- map
- map_50
- map_75
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
  example_title: Example Image
---

# RT-DETRv2 Fine-tuned for Voucher Classification

This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection.

## Model Details

### Model Description
- **Model Type**: Object Detection (RT-DETRv2)
- **Base Model**: PekingU/rtdetr_v2_r101vd
- **Task**: Multi-class voucher classification and detection
- **Classes**: 3 classes
  - 0: digital (digital invoices)
  - 1: fisico (physical receipts on blank pages)
  - 2: tesoreria (small on-site payment receipts)

### Training Details

**Training Dataset:**
- **Total Samples**: 663
- **Class Distribution**:
- **fisico** (id: 1): 441 samples (66.5%)
- **digital** (id: 0): 177 samples (26.7%)
- **tesoreria** (id: 2): 45 samples (6.8%)


**Training Configuration:**
- **Image Size**: 832x832
- **Batch Size**: 32
- **Learning Rate**: 1e-05
- **Weight Decay**: 0.01
- **Epochs**: 80
- **Validation Split**: 0.15

**Data Processing:**
- Pre-augmented dataset used (no runtime augmentation)
- External train/validation split (REQUIRED - use create_train_val_split.py)
- Preprocessing: Resize + Normalization only

### Performance Metrics

**Metric Definitions:**

- **mAP (mean Average Precision)**: Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better)
- **mAP@50**: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location  
- **mAP@75**: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization
- **IoU (Intersection over Union)**: Overlap between predicted and ground truth bounding boxes

**Performance Ranges:**
- 0.9+: Excellent
- 0.8-0.9: Very Good  
- 0.7-0.8: Good
- 0.5-0.7: Fair
- <0.5: Poor (needs improvement)

**Final Evaluation Results:**


**Overall Detection Performance:**
- **mAP**: 0.0000
- **mAP@50**: 0.0000
- **mAP@75**: 0.0000

**Per-Class Average Precision:**
- **Digital invoices**: 0.0000 (needs improvement)
- **Fisico receipts**: 0.0000 (needs improvement)
- **Tesoreria receipts**: 0.0000 (needs improvement)

**Model Confidence:**
- **Digital invoices mean confidence**: 0.4218 (low)
- **Fisico receipts mean confidence**: 0.3837 (low)
- **Tesoreria receipts mean confidence**: 0.0000 (low)

**Performance by Object Size:**
- **Small objects**: -1.0000
- **Medium objects**: -1.0000
- **Large objects**: 0.0000

**Evaluation Dataset:**
- **Digital invoices**: 53 samples (27.5%)
- **Fisico receipts**: 127 samples (65.8%)
- **Tesoreria receipts**: 13 samples (6.7%)
- **Total evaluation samples**: 193

**Model Configuration:**
- **Base model**: PekingU/rtdetr_v2_r101vd
- **Architecture**: rtdetr_v2_r101vd
- **Input resolution**: 832×832 pixels
- **Training epochs**: 80
- **Batch size**: 32

**Training Hardware:**
- **GPU**: NVIDIA H100 80GB HBM3
- **VRAM**: 79.2 GB
- **RAM**: 235.9 GB
- **GPU configuration**: H100 optimized

**Training Time**: 27.0 minutes

**Training Summary:**
- **Final training loss**: 10.7460
- **Final learning rate**: 1.77e-11


### MLflow Tracking

- **MLflow Run ID**: 6b50f63a6e3144b7a719bbb2b15cb77a
- **MLflow Experiment**: RT-DETRv2_Voucher_Classification


## Usage

```python
from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np

# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")

# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process results
target_sizes = torch.tensor([image.size[::-1]])  # (height, width)
results = image_processor.post_process_object_detection(
    outputs, 
    target_sizes=target_sizes, 
    threshold=0.5
)[0]

# Print predictions
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(f"Class: {model.config.id2label[label.item()]}")
    print(f"Confidence: {score.item():.3f}")
    print(f"BBox: {box.tolist()}")
```

## Training Procedure

The model was fine-tuned using the Hugging Face Transformers library with:
- Pre-augmented dataset focusing on challenging cases
- Format-specific augmentation strategies applied during data preparation
- MLflow experiment tracking for reproducibility
- External train/validation split REQUIRED for unbiased evaluation (no fallback to training data)

## Limitations and Bias

- Trained specifically on voucher/receipt images
- Performance may vary on images significantly different from training distribution
- Model optimized for 3-class voucher classification task

## Citation

If you use this model, please cite:

```bibtex
@misc{rtdetr-v2-voucher-classifier,
  title={RT-DETRv2 Fine-tuned for Voucher Classification},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}
```