|
--- |
|
library_name: transformers |
|
license: other |
|
base_model: nvidia/mit-b0 |
|
tags: |
|
- image-classification |
|
- vision |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: newspaper_classifier_segformer |
|
results: [] |
|
datasets: |
|
- taresco/newspaper_ocr |
|
language: |
|
- en |
|
- yo |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
# newspaper_classifier_segformer |
|
|
|
This model is a fine-tuned version of `nvidia/mit-b0` on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (`segment`) and those that don't (`no_segment`). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths. |
|
|
|
|
|
## Model Details |
|
- **Base Architecture**: SegFormer (`nvidia/mit-b0`) - a transformer-based architecture that balances efficiency and performance for vision tasks |
|
- **Training Dataset**: `taresco/document_ocr` - specialized collection of text document images with segmentation annotations |
|
- **Input Format**: RGB images resized to 512×512 pixels |
|
- **Output Classes**: |
|
- `segment`: Images containing two or more distinct, unrelated text segments that require special OCR processing |
|
- `no_segment`: Images containing single, cohesive content that can follow standard |
|
|
|
## Intended Uses & Applications |
|
- **OCR Pipeline Integration**: Primary use is as a preprocessing classifier in OCR workflows for document digitization |
|
- **Document Routing**: Automatically route documents to specialized segmentation processing when needed |
|
- **Batch Processing**: Efficiently handle large collections of document archives by applying appropriate processing techniques |
|
- **Digital Library Processing**: Support for historical text document digitization projects |
|
|
|
|
|
## Training and evaluation data |
|
|
|
The model was fine-tuned on the `taresco/newspaper_ocr dataset`. The dataset contains newspaper images labeled as either segment or no_segment. |
|
|
|
Dataset Splits: |
|
Training Set: 19,111 examples, with 15% of this split set aside for cross-validation during training. |
|
Test Set: 4,787 examples |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-5 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- seed: 42 |
|
- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08 |
|
- num_epochs: 3 |
|
|
|
### Training results |
|
|
|
The model achieved the following results on the evaluation set: |
|
|
|
- Loss: 0.0198 |
|
- Accuracy: 99.62% |
|
|
|
```text |
|
precision recall f1-score support |
|
|
|
no_segment 1.00 0.99 1.00 4471 |
|
segment 0.91 0.98 0.95 316 |
|
|
|
accuracy 0.99 4787 |
|
macro avg 0.95 0.99 0.97 4787 |
|
weighted avg 0.99 0.99 0.99 4787 |
|
``` |
|
|
|
## How to Use |
|
You can use this model with the Hugging Face transformers library: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Load the pipeline |
|
pipe = pipeline("image-classification", model="taresco/newspaper_classifier_segformer") |
|
|
|
# Classify an image |
|
image_path = "path_to_your_image.jpg" |
|
result = pipe(image_path) |
|
print(result) |
|
``` |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.51.3 |
|
- Pytorch 2.6.0+cu124 |
|
- Datasets 3.5.0 |
|
- Tokenizers 0.21.0 |