ToluClassics commited on
Commit
19f46b0
·
verified ·
1 Parent(s): 385750e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -23
README.md CHANGED
@@ -21,29 +21,22 @@ pipeline_tag: image-classification
21
 
22
  # newspaper_classifier_segformer
23
 
24
- This model is a fine-tuned version of `nvidia/mit-b0` on the `taresco/newspaper_ocr` dataset. It is designed to classify newspaper images into two categories: segment and no_segment.
25
-
26
-
27
- ## Model description
28
-
29
- The model is based on the SegFormer architecture, which is a lightweight and efficient transformer-based model for image classification and segmentation tasks. It has been fine-tuned specifically for the task of classifying newspaper images into the aforementioned categories.
30
-
31
- Key Features:
32
- - Architecture: SegFormer (nvidia/mit-b0), known for its efficiency and strong performance on image classification tasks.
33
- - Input Size: The model processes images resized to 512x512 pixels.
34
- - Output Classes:
35
- segment: Indicates the presence of two distinct unrelated news segments in the image.
36
- no_segment: Indicates the absence two or more distinct news segments in the image.
37
-
38
- ## Intended uses & limitations
39
-
40
- Intended Uses:
41
- - Newspaper Image Classification: The model is intended to classify newspaper images into segment and no_segment categories.
42
- - OCR Preprocessing: It can be used as a preprocessing step for OCR tasks to identify images that will require further text localization.
43
-
44
- Limitations:
45
- -Domain-Specific: The model is fine-tuned on the `taresco/newspaper_ocr` dataset and may not generalize well to other types of images or domains.
46
- - Image Quality: The model's performance may degrade on low-quality or noisy images.
47
 
48
 
49
  ## Training and evaluation data
 
21
 
22
  # newspaper_classifier_segformer
23
 
24
+ This model is a fine-tuned version of `nvidia/mit-b0` on a document OCR dataset. It classifies text document images into two categories: those requiring special segmentation processing (`segment`) and those that don't (`no_segment`). This classification is a critical preprocessing step in our OCR pipeline, enabling optimized document processing paths.
25
+
26
+
27
+ ## Model Details
28
+ - **Base Architecture**: SegFormer (`nvidia/mit-b0`) - a transformer-based architecture that balances efficiency and performance for vision tasks
29
+ - **Training Dataset**: `taresco/document_ocr` - specialized collection of text document images with segmentation annotations
30
+ - **Input Format**: RGB images resized to 512×512 pixels
31
+ - **Output Classes**:
32
+ - `segment`: Images containing two or more distinct, unrelated text segments that require special OCR processing
33
+ - `no_segment`: Images containing single, cohesive content that can follow standard
34
+
35
+ ## Intended Uses & Applications
36
+ - **OCR Pipeline Integration**: Primary use is as a preprocessing classifier in OCR workflows for document digitization
37
+ - **Document Routing**: Automatically route documents to specialized segmentation processing when needed
38
+ - **Batch Processing**: Efficiently handle large collections of document archives by applying appropriate processing techniques
39
+ - **Digital Library Processing**: Support for historical text document digitization projects
 
 
 
 
 
 
 
40
 
41
 
42
  ## Training and evaluation data