taresco
/

newspaper_classifier_segformer

Image Classification

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

newspaper_classifier_segformer / README.md

ToluClassics's picture

Update README.md

19f46b0 verified 5 months ago

|

history blame contribute delete

3.25 kB

	---
	library_name: transformers
	license: other
	base_model: nvidia/mit-b0
	tags:
	- image-classification
	- vision
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: newspaper_classifier_segformer
	results: []
	datasets:
	- taresco/newspaper_ocr
	language:
	- en
	- yo
	pipeline_tag: image-classification
	---

	# newspaper_classifier_segformer

	This model is a fine-tuned version of `nvidia/mit-b0` on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (`segment`) and those that don't (`no_segment`). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.


	## Model Details
	- Base Architecture: SegFormer (`nvidia/mit-b0`) - a transformer-based architecture that balances efficiency and performance for vision tasks
	- Training Dataset: `taresco/document_ocr` - specialized collection of text document images with segmentation annotations
	- Input Format: RGB images resized to 512×512 pixels
	- Output Classes:
	- `segment`: Images containing two or more distinct, unrelated text segments that require special OCR processing
	- `no_segment`: Images containing single, cohesive content that can follow standard

	## Intended Uses & Applications
	- OCR Pipeline Integration: Primary use is as a preprocessing classifier in OCR workflows for document digitization
	- Document Routing: Automatically route documents to specialized segmentation processing when needed
	- Batch Processing: Efficiently handle large collections of document archives by applying appropriate processing techniques
	- Digital Library Processing: Support for historical text document digitization projects


	## Training and evaluation data

	The model was fine-tuned on the `taresco/newspaper_ocr dataset`. The dataset contains newspaper images labeled as either segment or no_segment.

	Dataset Splits:
	Training Set: 19,111 examples, with 15% of this split set aside for cross-validation during training.
	Test Set: 4,787 examples

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-5
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
	- num_epochs: 3

	### Training results

	The model achieved the following results on the evaluation set:

	- Loss: 0.0198
	- Accuracy: 99.62%

	```text
	precision recall f1-score support

	no_segment 1.00 0.99 1.00 4471
	segment 0.91 0.98 0.95 316

	accuracy 0.99 4787
	macro avg 0.95 0.99 0.97 4787
	weighted avg 0.99 0.99 0.99 4787
	```

	## How to Use
	You can use this model with the Hugging Face transformers library:

	```python
	from transformers import pipeline

	# Load the pipeline
	pipe = pipeline("image-classification", model="taresco/newspaper_classifier_segformer")

	# Classify an image
	image_path = "path_to_your_image.jpg"
	result = pipe(image_path)
	print(result)
	```

	### Framework versions

	- Transformers 4.51.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.5.0
	- Tokenizers 0.21.0