File size: 4,277 Bytes
890e7ea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
- object-detection
- computer-vision
- voucher-classification
- rt-detr
- rtdetrv2
datasets:
- custom-voucher-dataset
metrics:
- map
- map_50
- map_75
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
example_title: Example Image
---
# RT-DETRv2 Fine-tuned for Voucher Classification
This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection.
## Model Details
### Model Description
- **Model Type**: Object Detection (RT-DETRv2)
- **Base Model**: PekingU/rtdetr_v2_r101vd
- **Task**: Multi-class voucher classification and detection
- **Classes**: 3 classes
- 0: digital (digital invoices)
- 1: fisico (physical receipts on blank pages)
- 2: tesoreria (small on-site payment receipts)
### Training Details
**Training Dataset:**
- **Total Samples**: 507
- **Class Distribution**:
- **fisico** (id: 1): 241 samples (47.5%)
- **digital** (id: 0): 147 samples (29.0%)
- **tesoreria** (id: 2): 119 samples (23.5%)
**Training Configuration:**
- **Image Size**: 800x800
- **Batch Size**: 24
- **Learning Rate**: 1.5e-05
- **Weight Decay**: 0.0001
- **Epochs**: 2
- **Validation Split**: 0.0
**Data Processing:**
- Pre-augmented dataset used (no runtime augmentation)
- External train/validation split (use create_train_val_split.py)
- Preprocessing: Resize + Normalization only
### Performance Metrics
**Final Evaluation Results:**
**Dataset Information:**
*Training Dataset:*
- **Digital invoices**: 147 samples (29.0%)
- **Fisico receipts**: 241 samples (47.5%)
- **Tesoreria receipts**: 119 samples (23.5%)
- **Total training samples**: 507
**Model Configuration:**
- **Base model**: PekingU/rtdetr_v2_r101vd
- **Architecture**: rtdetr_v2_r101vd
- **Input resolution**: 800×800 pixels
- **Training epochs**: 2
- **Batch size**: 24
**Training Hardware:**
- **GPU**: NVIDIA A100-SXM4-40GB
- **VRAM**: 39.6 GB
- **RAM**: 83.5 GB
- **GPU configuration**: A100 optimized
**Training Time**: 0.0 minutes
**Training Summary:**
- **Final training loss**: 0.0000
### MLflow Tracking
- **MLflow Run ID**: c348e8235f8c40138c05c051fc207bb6
- **MLflow Experiment**: RT-DETRv2_Voucher_Classification
## Usage
```python
from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np
# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process results
target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
results = image_processor.post_process_object_detection(
outputs,
target_sizes=target_sizes,
threshold=0.5
)[0]
# Print predictions
class_names = ["digital", "fisico", "tesoreria"]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
print(f"Class: {class_names[label.item()]}")
print(f"Confidence: {score.item():.3f}")
print(f"BBox: {box.tolist()}")
```
## Training Procedure
The model was fine-tuned using the Hugging Face Transformers library with:
- Pre-augmented dataset focusing on challenging cases
- Format-specific augmentation strategies applied during data preparation
- MLflow experiment tracking for reproducibility
- External train/validation split for unbiased evaluation
## Limitations and Bias
- Trained specifically on voucher/receipt images
- Performance may vary on images significantly different from training distribution
- Model optimized for 3-class voucher classification task
## Citation
If you use this model, please cite:
```bibtex
@misc{rtdetr-v2-voucher-classifier,
title={RT-DETRv2 Fine-tuned for Voucher Classification},
author={Your Name},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}
```
|