ToluClassics's picture
Update README.md
19f46b0 verified
---
library_name: transformers
license: other
base_model: nvidia/mit-b0
tags:
- image-classification
- vision
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: newspaper_classifier_segformer
results: []
datasets:
- taresco/newspaper_ocr
language:
- en
- yo
pipeline_tag: image-classification
---
# newspaper_classifier_segformer
This model is a fine-tuned version of `nvidia/mit-b0` on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (`segment`) and those that don't (`no_segment`). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.
## Model Details
- **Base Architecture**: SegFormer (`nvidia/mit-b0`) - a transformer-based architecture that balances efficiency and performance for vision tasks
- **Training Dataset**: `taresco/document_ocr` - specialized collection of text document images with segmentation annotations
- **Input Format**: RGB images resized to 512×512 pixels
- **Output Classes**:
- `segment`: Images containing two or more distinct, unrelated text segments that require special OCR processing
- `no_segment`: Images containing single, cohesive content that can follow standard
## Intended Uses & Applications
- **OCR Pipeline Integration**: Primary use is as a preprocessing classifier in OCR workflows for document digitization
- **Document Routing**: Automatically route documents to specialized segmentation processing when needed
- **Batch Processing**: Efficiently handle large collections of document archives by applying appropriate processing techniques
- **Digital Library Processing**: Support for historical text document digitization projects
## Training and evaluation data
The model was fine-tuned on the `taresco/newspaper_ocr dataset`. The dataset contains newspaper images labeled as either segment or no_segment.
Dataset Splits:
Training Set: 19,111 examples, with 15% of this split set aside for cross-validation during training.
Test Set: 4,787 examples
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-5
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- num_epochs: 3
### Training results
The model achieved the following results on the evaluation set:
- Loss: 0.0198
- Accuracy: 99.62%
```text
precision recall f1-score support
no_segment 1.00 0.99 1.00 4471
segment 0.91 0.98 0.95 316
accuracy 0.99 4787
macro avg 0.95 0.99 0.97 4787
weighted avg 0.99 0.99 0.99 4787
```
## How to Use
You can use this model with the Hugging Face transformers library:
```python
from transformers import pipeline
# Load the pipeline
pipe = pipeline("image-classification", model="taresco/newspaper_classifier_segformer")
# Classify an image
image_path = "path_to_your_image.jpg"
result = pipe(image_path)
print(result)
```
### Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.0