jnmrr's picture
Upload RT-DETRv2 voucher classifier
e026564 verified
|
raw
history blame
5.52 kB
---
license: apache-2.0
base_model: PekingU/rtdetr_v2_r101vd
tags:
- object-detection
- computer-vision
- voucher-classification
- rt-detr
- rtdetrv2
datasets:
- custom-voucher-dataset
metrics:
- map
- map_50
- map_75
widget:
- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
example_title: Example Image
---
# RT-DETRv2 Fine-tuned for Voucher Classification
This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection.
## Model Details
### Model Description
- **Model Type**: Object Detection (RT-DETRv2)
- **Base Model**: PekingU/rtdetr_v2_r101vd
- **Task**: Multi-class voucher classification and detection
- **Classes**: 3 classes
- 0: digital (digital invoices)
- 1: fisico (physical receipts on blank pages)
- 2: tesoreria (small on-site payment receipts)
### Training Details
**Training Dataset:**
- **Total Samples**: 1227
- **Class Distribution**:
- **tesoreria** (id: 2): 405 samples (33.0%)
- **fisico** (id: 1): 416 samples (33.9%)
- **digital** (id: 0): 406 samples (33.1%)
**Training Configuration:**
- **Image Size**: 832x832
- **Batch Size**: 32
- **Learning Rate**: 3e-05
- **Weight Decay**: 0.01
- **Epochs**: 80
- **Validation Split**: 0.2
**Data Processing:**
- Pre-augmented dataset used (no runtime augmentation)
- External train/validation split (REQUIRED - use create_train_val_split.py)
- Preprocessing: Resize + Normalization only
### Performance Metrics
**Metric Definitions:**
- **mAP (mean Average Precision)**: Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better)
- **mAP@50**: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location
- **mAP@75**: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization
- **IoU (Intersection over Union)**: Overlap between predicted and ground truth bounding boxes
**Performance Ranges:**
- 0.9+: Excellent
- 0.8-0.9: Very Good
- 0.7-0.8: Good
- 0.5-0.7: Fair
- <0.5: Poor (needs improvement)
**Final Evaluation Results:**
**Overall Detection Performance:**
- **mAP**: 0.0000
- **mAP@50**: 0.0000
- **mAP@75**: 0.0000
**Per-Class Average Precision:**
- **Digital invoices**: 0.0000 (needs improvement)
- **Fisico receipts**: 0.0000 (needs improvement)
- **Tesoreria receipts**: 0.0000 (needs improvement)
**Model Confidence:**
- **Digital invoices mean confidence**: 0.4346 (low)
- **Fisico receipts mean confidence**: 0.0000 (low)
- **Tesoreria receipts mean confidence**: 0.0000 (low)
**Performance by Object Size:**
- **Small objects**: -1.0000
- **Medium objects**: -1.0000
- **Large objects**: 0.0000
**Evaluation Dataset:**
- **Digital invoices**: 53 samples (27.5%)
- **Fisico receipts**: 127 samples (65.8%)
- **Tesoreria receipts**: 13 samples (6.7%)
- **Total evaluation samples**: 193
**Model Configuration:**
- **Base model**: PekingU/rtdetr_v2_r101vd
- **Architecture**: rtdetr_v2_r101vd
- **Input resolution**: 832×832 pixels
- **Training epochs**: 80
- **Batch size**: 32
**Training Hardware:**
- **GPU**: NVIDIA H100 80GB HBM3
- **VRAM**: 79.2 GB
- **RAM**: 235.9 GB
- **GPU configuration**: H100 optimized
**Training Time**: 39.6 minutes
**Training Summary:**
- **Final training loss**: 4.9881
- **Final learning rate**: 2.08e-08
### MLflow Tracking
- **MLflow Run ID**: 1690d8d04ea74ca99f0fea73a8466f83
- **MLflow Experiment**: RT-DETRv2_Voucher_Classification
## Usage
```python
from transformers import AutoModelForObjectDetection, AutoImageProcessor
import torch
from PIL import Image
import numpy as np
# Load model and processor
model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
# Load and preprocess image
image = Image.open("path/to/your/voucher.jpg").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# Post-process results
target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
results = image_processor.post_process_object_detection(
outputs,
target_sizes=target_sizes,
threshold=0.5
)[0]
# Print predictions
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
print(f"Class: {model.config.id2label[label.item()]}")
print(f"Confidence: {score.item():.3f}")
print(f"BBox: {box.tolist()}")
```
## Training Procedure
The model was fine-tuned using the Hugging Face Transformers library with:
- Pre-augmented dataset focusing on challenging cases
- Format-specific augmentation strategies applied during data preparation
- MLflow experiment tracking for reproducibility
- External train/validation split REQUIRED for unbiased evaluation (no fallback to training data)
## Limitations and Bias
- Trained specifically on voucher/receipt images
- Performance may vary on images significantly different from training distribution
- Model optimized for 3-class voucher classification task
## Citation
If you use this model, please cite:
```bibtex
@misc{rtdetr-v2-voucher-classifier,
title={RT-DETRv2 Fine-tuned for Voucher Classification},
author={Your Name},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
}
```