rtdetr-v2-voucher-classifier / README.md

Upload RT-DETRv2 voucher classifier

e026564 verified about 2 months ago

5.52 kB

	---
	license: apache-2.0
	base_model: PekingU/rtdetr_v2_r101vd
	tags:
	- object-detection
	- computer-vision
	- voucher-classification
	- rt-detr
	- rtdetrv2
	datasets:
	- custom-voucher-dataset
	metrics:
	- map
	- map_50
	- map_75
	widget:
	- src: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg
	example_title: Example Image
	---

	# RT-DETRv2 Fine-tuned for Voucher Classification

	This model is a fine-tuned version of [PekingU/rtdetr_v2_r101vd](https://huggingface.co/PekingU/rtdetr_v2_r101vd) for voucher classification and object detection.

	## Model Details

	### Model Description
	- Model Type: Object Detection (RT-DETRv2)
	- Base Model: PekingU/rtdetr_v2_r101vd
	- Task: Multi-class voucher classification and detection
	- Classes: 3 classes
	- 0: digital (digital invoices)
	- 1: fisico (physical receipts on blank pages)
	- 2: tesoreria (small on-site payment receipts)

	### Training Details

	Training Dataset:
	- Total Samples: 1227
	- Class Distribution:
	- tesoreria (id: 2): 405 samples (33.0%)
	- fisico (id: 1): 416 samples (33.9%)
	- digital (id: 0): 406 samples (33.1%)


	Training Configuration:
	- Image Size: 832x832
	- Batch Size: 32
	- Learning Rate: 3e-05
	- Weight Decay: 0.01
	- Epochs: 80
	- Validation Split: 0.2

	Data Processing:
	- Pre-augmented dataset used (no runtime augmentation)
	- External train/validation split (REQUIRED - use create_train_val_split.py)
	- Preprocessing: Resize + Normalization only

	### Performance Metrics

	Metric Definitions:

	- mAP (mean Average Precision): Overall performance metric averaged across all classes and IoU thresholds (0.0-1.0, higher is better)
	- mAP@50: mAP calculated at IoU threshold 0.5 - more lenient, measures if objects are found in roughly correct location
	- mAP@75: mAP calculated at IoU threshold 0.75 - more strict, requires precise bounding box localization
	- IoU (Intersection over Union): Overlap between predicted and ground truth bounding boxes

	Performance Ranges:
	- 0.9+: Excellent
	- 0.8-0.9: Very Good
	- 0.7-0.8: Good
	- 0.5-0.7: Fair
	- <0.5: Poor (needs improvement)

	Final Evaluation Results:


	Overall Detection Performance:
	- mAP: 0.0000
	- mAP@50: 0.0000
	- mAP@75: 0.0000

	Per-Class Average Precision:
	- Digital invoices: 0.0000 (needs improvement)
	- Fisico receipts: 0.0000 (needs improvement)
	- Tesoreria receipts: 0.0000 (needs improvement)

	Model Confidence:
	- Digital invoices mean confidence: 0.4346 (low)
	- Fisico receipts mean confidence: 0.0000 (low)
	- Tesoreria receipts mean confidence: 0.0000 (low)

	Performance by Object Size:
	- Small objects: -1.0000
	- Medium objects: -1.0000
	- Large objects: 0.0000

	Evaluation Dataset:
	- Digital invoices: 53 samples (27.5%)
	- Fisico receipts: 127 samples (65.8%)
	- Tesoreria receipts: 13 samples (6.7%)
	- Total evaluation samples: 193

	Model Configuration:
	- Base model: PekingU/rtdetr_v2_r101vd
	- Architecture: rtdetr_v2_r101vd
	- Input resolution: 832×832 pixels
	- Training epochs: 80
	- Batch size: 32

	Training Hardware:
	- GPU: NVIDIA H100 80GB HBM3
	- VRAM: 79.2 GB
	- RAM: 235.9 GB
	- GPU configuration: H100 optimized

	Training Time: 39.6 minutes

	Training Summary:
	- Final training loss: 4.9881
	- Final learning rate: 2.08e-08


	### MLflow Tracking

	- MLflow Run ID: 1690d8d04ea74ca99f0fea73a8466f83
	- MLflow Experiment: RT-DETRv2_Voucher_Classification


	## Usage

	```python
	from transformers import AutoModelForObjectDetection, AutoImageProcessor
	import torch
	from PIL import Image
	import numpy as np

	# Load model and processor
	model = AutoModelForObjectDetection.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")
	image_processor = AutoImageProcessor.from_pretrained("jnmrr/rtdetr-v2-voucher-classifier")

	# Load and preprocess image
	image = Image.open("path/to/your/voucher.jpg").convert("RGB")
	inputs = image_processor(images=image, return_tensors="pt")

	# Run inference
	with torch.no_grad():
	outputs = model(**inputs)

	# Post-process results
	target_sizes = torch.tensor([image.size[::-1]]) # (height, width)
	results = image_processor.post_process_object_detection(
	outputs,
	target_sizes=target_sizes,
	threshold=0.5
	)[0]

	# Print predictions
	for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
	print(f"Class: {model.config.id2label[label.item()]}")
	print(f"Confidence: {score.item():.3f}")
	print(f"BBox: {box.tolist()}")
	```

	## Training Procedure

	The model was fine-tuned using the Hugging Face Transformers library with:
	- Pre-augmented dataset focusing on challenging cases
	- Format-specific augmentation strategies applied during data preparation
	- MLflow experiment tracking for reproducibility
	- External train/validation split REQUIRED for unbiased evaluation (no fallback to training data)

	## Limitations and Bias

	- Trained specifically on voucher/receipt images
	- Performance may vary on images significantly different from training distribution
	- Model optimized for 3-class voucher classification task

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{rtdetr-v2-voucher-classifier,
	title={RT-DETRv2 Fine-tuned for Voucher Classification},
	author={Your Name},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/jnmrr/rtdetr-v2-voucher-classifier}
	}
	```