taresco
/

newspaper_classifier_segformer

Image Classification

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

ToluClassics commited on Apr 30

Commit

19f46b0

·

verified ·

1 Parent(s): 385750e

Update README.md

Files changed (1) hide show

README.md +16 -23

README.md CHANGED Viewed

@@ -21,29 +21,22 @@ pipeline_tag: image-classification
 # newspaper_classifier_segformer
-This model is a fine-tuned version of `nvidia/mit-b0` on the `taresco/newspaper_ocr` dataset. It is designed to classify newspaper images into two categories: segment and no_segment.
-## Model description
-The model is based on the SegFormer architecture, which is a lightweight and efficient transformer-based model for image classification and segmentation tasks. It has been fine-tuned specifically for the task of classifying newspaper images into the aforementioned categories.
-Key Features:
-- Architecture: SegFormer (nvidia/mit-b0), known for its efficiency and strong performance on image classification tasks.
-- Input Size: The model processes images resized to 512x512 pixels.
-- Output Classes:
-  segment: Indicates the presence of two distinct unrelated news segments in the image.
-  no_segment: Indicates the absence two or more distinct news segments in the image.
-## Intended uses & limitations
-Intended Uses:
-- Newspaper Image Classification: The model is intended to classify newspaper images into segment and no_segment categories.
-- OCR Preprocessing: It can be used as a preprocessing step for OCR tasks to identify images that will require further text localization.
-Limitations:
--Domain-Specific: The model is fine-tuned on the `taresco/newspaper_ocr` dataset and may not generalize well to other types of images or domains.
-- Image Quality: The model's performance may degrade on low-quality or noisy images.
 ## Training and evaluation data

 # newspaper_classifier_segformer
+This model is a fine-tuned version of `nvidia/mit-b0` on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (`segment`) and those that don't (`no_segment`). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.
+## Model Details
+- **Base Architecture**: SegFormer (`nvidia/mit-b0`) - a transformer-based architecture that balances efficiency and performance for vision tasks
+- **Training Dataset**: `taresco/document_ocr` - specialized collection of text document images with segmentation annotations
+- **Input Format**: RGB images resized to 512×512 pixels
+- **Output Classes**:
+  - `segment`: Images containing two or more distinct, unrelated text segments that require special OCR processing
+  - `no_segment`: Images containing single, cohesive content that can follow standard
+## Intended Uses & Applications
+- **OCR Pipeline Integration**: Primary use is as a preprocessing classifier in OCR workflows for document digitization
+- **Document Routing**: Automatically route documents to specialized segmentation processing when needed
+- **Batch Processing**: Efficiently handle large collections of document archives by applying appropriate processing techniques
+- **Digital Library Processing**: Support for historical text document digitization projects
 ## Training and evaluation data