File size: 4,277 Bytes
890e7ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
- object-detection
- computer-vision
- voucher-classification
- rt-detr
- rtdetrv2
datasets:
- custom-voucher-dataset
metrics:
- map
- map_50
- map_75
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
  example_title: Example Image
---

# RT-DETRv2 Fine-tuned for Voucher Classification

This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection.

## Model Details

### Model Description
- **Model Type**: Object Detection (RT-DETRv2)
- **Base Model**: PekingU/rtdetr_v2_r101vd
- **Task**: Multi-class voucher classification and detection
- **Classes**: 3 classes
  - 0: digital (digital invoices)
  - 1: fisico (physical receipts on blank pages)
  - 2: tesoreria (small on-site payment receipts)

### Training Details

**Training Dataset:**
- **Total Samples**: 507
- **Class Distribution**:
- **fisico** (id: 1): 241 samples (47.5%)
- **digital** (id: 0): 147 samples (29.0%)
- **tesoreria** (id: 2): 119 samples (23.5%)


**Training Configuration:**
- **Image Size**: 800x800
- **Batch Size**: 24
- **Learning Rate**: 1.5e-05
- **Weight Decay**: 0.0001
- **Epochs**: 2
- **Validation Split**: 0.0

**Data Processing:**
- Pre-augmented dataset used (no runtime augmentation)
- External train/validation split (use create_train_val_split.py)
- Preprocessing: Resize + Normalization only

### Performance Metrics

**Final Evaluation Results:**
**Dataset Information:**
*Training Dataset:*
- **Digital invoices**: 147 samples (29.0%)
- **Fisico receipts**: 241 samples (47.5%)
- **Tesoreria receipts**: 119 samples (23.5%)
- **Total training samples**: 507

**Model Configuration:**
- **Base model**: PekingU/rtdetr_v2_r101vd
- **Architecture**: rtdetr_v2_r101vd
- **Input resolution**: 800×800 pixels
- **Training epochs**: 2
- **Batch size**: 24

**Training Hardware:**
- **GPU**: NVIDIA A100-SXM4-40GB
- **VRAM**: 39.6 GB
- **RAM**: 83.5 GB
- **GPU configuration**: A100 optimized

**Training Time**: 0.0 minutes

**Training Summary:**
- **Final training loss**: 0.0000


### MLflow Tracking

- **MLflow Run ID**: c348e8235f8c40138c05c051fc207bb6
- **MLflow Experiment**: RT-DETRv2_Voucher_Classification


## Usage

```python
from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np

# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")

# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)

# Post-process results
target_sizes = torch.tensor([image.size[::-1]])  # (height, width)
results = image_processor.post_process_object_detection(
    outputs, 
    target_sizes=target_sizes, 
    threshold=0.5
)[0]

# Print predictions
class_names = ["digital", "fisico", "tesoreria"]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(f"Class: {class_names[label.item()]}")
    print(f"Confidence: {score.item():.3f}")
    print(f"BBox: {box.tolist()}")
```

## Training Procedure

The model was fine-tuned using the Hugging Face Transformers library with:
- Pre-augmented dataset focusing on challenging cases
- Format-specific augmentation strategies applied during data preparation
- MLflow experiment tracking for reproducibility
- External train/validation split for unbiased evaluation

## Limitations and Bias

- Trained specifically on voucher/receipt images
- Performance may vary on images significantly different from training distribution
- Model optimized for 3-class voucher classification task

## Citation

If you use this model, please cite:

```bibtex
@misc{rtdetr-v2-voucher-classifier,
  title={RT-DETRv2 Fine-tuned for Voucher Classification},
  author={Your Name},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}
```