metadata
license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
- object-detection
- computer-vision
- voucher-classification
- rt-detr
- rtdetrv2
datasets:
- custom-voucher-dataset
metrics:
- map
- map_50
- map_75
widget:
- src: >-
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
example_title: Example Image
RT-DETRv2 Fine-tuned for Voucher Classification
This model is a fine-tuned version of PekingU/rtdetr_v2_r101vd for voucher classification and object detection.
Model Details
Model Description
- Model Type: Object Detection (RT-DETRv2)
- Base Model: PekingU/rtdetr_v2_r101vd
- Task: Multi-class voucher classification and detection
- Classes: 3 classes
- 0: digital (digital invoices)
- 1: fisico (physical receipts on blank pages)
- 2: tesoreria (small on-site payment receipts)
Training Details
Training Dataset:
- Total Samples: 507
- Class Distribution:
- fisico (id: 1): 241 samples (47.5%)
- digital (id: 0): 147 samples (29.0%)
- tesoreria (id: 2): 119 samples (23.5%)
Training Configuration:
- Image Size: 800x800
- Batch Size: 24
- Learning Rate: 1.5e-05
- Weight Decay: 0.0001
- Epochs: 2
- Validation Split: 0.0
Data Processing:
- Pre-augmented dataset used (no runtime augmentation)
- External train/validation split (use create_train_val_split.py)
- Preprocessing: Resize + Normalization only
Performance Metrics
Final Evaluation Results: Dataset Information: Training Dataset:
- Digital invoices: 147 samples (29.0%)
- Fisico receipts: 241 samples (47.5%)
- Tesoreria receipts: 119 samples (23.5%)
- Total training samples: 507
Model Configuration:
- Base model: PekingU/rtdetr_v2_r101vd
- Architecture: rtdetr_v2_r101vd
- Input resolution: 800×800 pixels
- Training epochs: 2
- Batch size: 24
Training Hardware:
- GPU: NVIDIA A100-SXM4-40GB
- VRAM: 39.6 GB
- RAM: 83.5 GB
- GPU configuration: A100 optimized
Training Time: 0.0 minutes
Training Summary:
- Final training loss: 0.0000
MLflow Tracking
- MLflow Run ID: c348e8235f8c40138c05c051fc207bb6
- MLflow Experiment: RT-DETRv2_Voucher_Classification
Usage
from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np
# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process results
target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
results = image_processor.post_process_object_detection(
outputs,
target_sizes=target_sizes,
threshold=0.5
)[0]
# Print predictions
class_names = ["digital", "fisico", "tesoreria"]
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
print(f"Class: {class_names[label.item()]}")
print(f"Confidence: {score.item():.3f}")
print(f"BBox: {box.tolist()}")
Training Procedure
The model was fine-tuned using the Hugging Face Transformers library with:
- Pre-augmented dataset focusing on challenging cases
- Format-specific augmentation strategies applied during data preparation
- MLflow experiment tracking for reproducibility
- External train/validation split for unbiased evaluation
Limitations and Bias
- Trained specifically on voucher/receipt images
- Performance may vary on images significantly different from training distribution
- Model optimized for 3-class voucher classification task
Citation
If you use this model, please cite:
@misc{rtdetr-v2-voucher-classifier,
title={RT-DETRv2 Fine-tuned for Voucher Classification},
author={Your Name},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}