YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Avestan OCR Training and Application β Kraken + eScriptorium
This folder contains the full training and application pipeline for Avestan OCR using Kraken and eScriptorium. It handles image segmentation, recognition model training, and output generation using ALTO XML. Outputs are later converted to CAB-compatible XML formats using tools from the xml_translator/
module.
Folder Structure
Applying_OCR/
βββ Makefile # Defines all Kraken training and evaluation targets
βββ models/ # Stores trained segmentation and recognition models
β βββ segmentation/
β βββ recognition/
Workflow Overview
Hugging Face model cards donβt render Mermaid.
See a rendered version on GitHub, or expand the source below.
View rendered diagram on GitHub β
Makefile Targets
The Makefile
defines the Kraken training and evaluation pipeline. Example targets include:
train_seg:
kraken train -i data/segmentation/*.xml -o models/segmentation/model.mlmodel
train_recog:
kraken train -i data/recognition/*.xml -o models/recognition/model.mlmodel
eval:
kraken eval -m models/recognition/model.mlmodel -i test/*.xml
Use make train_seg
, make train_recog
, or define your own targets for batch training/evaluation.
Input/Output Formats
Input:
- Line-segmented manuscript images (from eScriptorium or Kraken segmenter)
- ALTO XML files with gold-standard transcriptions
Output:
.mlmodel
files (segmentation + recognition)- ALTO XML predictions from Kraken
- CAB-format XML via
xml_translator/
Dependencies
This pipeline assumes you have:
- Kraken installed (
pip install kraken
) - eScriptorium for GUI-assisted segmentation/transcription
- Pre-cleaned ALTO XML exported from eScriptorium
- Downloads last month
- 506
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support