--- license: apache-2.0 base_model: PekingU/rtdetr_v2_r101vd tags: - object-detection - computer-vision - voucher-classification - rt-detr - rtdetrv2 datasets: - custom-voucher-dataset metrics: - map - map_50 - map_75 widget: - src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg example_title: Example Image --- # RT-DETRv2 Fine-tuned for Voucher Classification This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection. ## Model Details ### Model Description - **Model Type**: Object Detection (RT-DETRv2) - **Base Model**: PekingU/rtdetr_v2_r101vd - **Task**: Multi-class voucher classification and detection - **Classes**: 3 classes - 0: digital (digital invoices) - 1: fisico (physical receipts on blank pages) - 2: tesoreria (small on-site payment receipts) ### Training Details **Training Dataset:** - **Total Samples**: 507 - **Class Distribution**: - **fisico** (id: 1): 241 samples (47.5%) - **digital** (id: 0): 147 samples (29.0%) - **tesoreria** (id: 2): 119 samples (23.5%) **Training Configuration:** - **Image Size**: 800x800 - **Batch Size**: 24 - **Learning Rate**: 1.5e-05 - **Weight Decay**: 0.0001 - **Epochs**: 2 - **Validation Split**: 0.0 **Data Processing:** - Pre-augmented dataset used (no runtime augmentation) - External train/validation split (use create_train_val_split.py) - Preprocessing: Resize + Normalization only ### Performance Metrics **Final Evaluation Results:** **Dataset Information:** *Training Dataset:* - **Digital invoices**: 147 samples (29.0%) - **Fisico receipts**: 241 samples (47.5%) - **Tesoreria receipts**: 119 samples (23.5%) - **Total training samples**: 507 **Model Configuration:** - **Base model**: PekingU/rtdetr_v2_r101vd - **Architecture**: rtdetr_v2_r101vd - **Input resolution**: 800×800 pixels - **Training epochs**: 2 - **Batch size**: 24 **Training Hardware:** - **GPU**: NVIDIA A100-SXM4-40GB - **VRAM**: 39.6 GB - **RAM**: 83.5 GB - **GPU configuration**: A100 optimized **Training Time**: 0.0 minutes **Training Summary:** - **Final training loss**: 0.0000 ### MLflow Tracking - **MLflow Run ID**: c348e8235f8c40138c05c051fc207bb6 - **MLflow Experiment**: RT-DETRv2_Voucher_Classification ## Usage ```python from transformers import AutoModelForObjectDetection, AutoImageProcessor import torch from PIL import Image import numpy as np # Load model and processor model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier") image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier") # Load and preprocess image image = Image.open("path/to/your/voucher.jpg").convert("RGB") inputs = image_processor(images=image, return_tensors="pt") # Run inference with torch.no_grad(): outputs = model(**inputs) # Post-process results target_sizes = torch.tensor([image.size[::-1]]) # (height, width) results = image_processor.post_process_object_detection( outputs, target_sizes=target_sizes, threshold=0.5 )[0] # Print predictions class_names = ["digital", "fisico", "tesoreria"] for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): print(f"Class: {class_names[label.item()]}") print(f"Confidence: {score.item():.3f}") print(f"BBox: {box.tolist()}") ``` ## Training Procedure The model was fine-tuned using the Hugging Face Transformers library with: - Pre-augmented dataset focusing on challenging cases - Format-specific augmentation strategies applied during data preparation - MLflow experiment tracking for reproducibility - External train/validation split for unbiased evaluation ## Limitations and Bias - Trained specifically on voucher/receipt images - Performance may vary on images significantly different from training distribution - Model optimized for 3-class voucher classification task ## Citation If you use this model, please cite: ```bibtex @misc{rtdetr-v2-voucher-classifier, title={RT-DETRv2 Fine-tuned for Voucher Classification}, author={Your Name}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier} } ```