--- language: en license: apache-2.0 tags: - vision - image-classification - document-classification - knowledge-distillation - vit - rvl-cdip - tiny-model - distilled-model datasets: - rvl_cdip metrics: - accuracy pipeline_tag: image-classification --- # ViT-Tiny Classifier for RVL-CDIP Document Classification (Distilled) This model is a compressed Vision Transformer (ViT-Tiny) trained using knowledge distillation from DiT-Large on the RVL-CDIP dataset for document image classification. This model was developed as part of a **research internship at the Laboratory of Complex Systems, Ecole Centrale Casablanca** ## Model Details - **Student Model**: ViT-Tiny (Vision Transformer) - **Teacher Model**: microsoft/dit-large-finetuned-rvlcdip - **Training Method**: Knowledge Distillation - **Parameters**: ~5.5M (55x smaller than teacher) - **Dataset**: RVL-CDIP (320k document images, 16 classes) - **Task**: Document Image Classification - **Accuracy**: 0.9210 - **Compression Ratio**: ~55x parameter reduction from teacher model ## Document Classes The model classifies documents into 16 categories: 1. **letter** - Personal or business correspondence 2. **form** - Structured forms and applications 3. **email** - Email communications 4. **handwritten** - Handwritten documents 5. **advertisement** - Marketing materials and ads 6. **scientific_report** - Research reports and studies 7. **scientific_publication** - Academic papers and journals 8. **specification** - Technical specifications 9. **file_folder** - File folders and organizational documents 10. **news_article** - News articles and press releases 11. **budget** - Financial budgets and planning documents 12. **invoice** - Bills and invoices 13. **presentation** - Presentation slides 14. **questionnaire** - Surveys and questionnaires 15. **resume** - CVs and resumes 16. **memo** - Internal memos and notices ## Usage ```python from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image # Load model processor = AutoImageProcessor.from_pretrained("HAMMALE/vit-tiny-classifier-rvlcdip") model = AutoModelForImageClassification.from_pretrained("HAMMALE/vit-tiny-classifier-rvlcdip") # Load and classify an image image = Image.open("path_to_your_document_image.jpg") inputs = processor(image, return_tensors="pt") # Get predictions outputs = model(**inputs) predicted_class_id = outputs.logits.argmax(-1).item() # Get class names class_names = [ "letter", "form", "email", "handwritten", "advertisement", "scientific_report", "scientific_publication", "specification", "file_folder", "news_article", "budget", "invoice", "presentation", "questionnaire", "resume", "memo" ] predicted_class = class_names[predicted_class_id] print("Predicted class:", predicted_class) ``` ## Performance | Metric | Value | |--------|-------| | Accuracy | 0.9210 | | Parameters | ~5.5M | | Model Size | ~22 MB | | Input Size | 224x224 pixels | ## Training Details - **Student Architecture**: Vision Transformer (ViT-Tiny) - **Teacher Model**: microsoft/dit-large-finetuned-rvlcdip - **Distillation Method**: Knowledge Distillation - **Input Resolution**: 224x224 - **Preprocessing**: Standard ImageNet normalization - **Framework**: Transformers/PyTorch - **Distillation Benefits**: Maintains high accuracy with 55x fewer parameters ## Dataset The RVL-CDIP (Ryerson Vision Lab Complex Document Information Processing) dataset contains: - 400,000 grayscale document images - 16 document categories - Images collected from truth tobacco industry documents - Standard train/validation/test splits ## Citation ```bibtex @misc{hammale2025vit_tiny_rvlcdip_distilled, title={ViT-Tiny Classifier for RVL-CDIP Document Classification (Distilled)}, author={Hammale, Mourad}, year={2025}, howpublished={\url{https://huggingface.co/HAMMALE/vit-tiny-classifier-rvlcdip}}, note={Knowledge distilled from microsoft/dit-large-finetuned-rvlcdip} } ``` ## Acknowledgments This model was created by HAMMALE (Mourad) through knowledge distillation from the larger DiT-Large model (microsoft/dit-large-finetuned-rvlcdip), achieving significant compression while maintaining competitive performance for document classification tasks. ## License This model is released under the Apache 2.0 license.