AI & ML interests

None defined yet.

Recent Activity

Data Lab at the Danish National Archives

The Danish National Archives (Rigsarkivet) preserves and provides access to the records of Danish society. Within the Archives, the Data Science department works to make collections more accessible and usable through the use of data-driven methods and technologies. A key part of this effort is the Data Lab, a team dedicated to historical document processing through AI.

We strive to share as much as possible of our annotated training data for Handwritten Text Recognition (HTR) and related historical document analysis tasks. Our goal is to support both researchers and developers working with historical collections—whether in training machine learning models or in improving access to archival material.

Our training datasets span a variety of sources and formats: PAGE and ALTO files from 18th-century parish records, annotated images of early 20th-century death certificates, and more. While our primary focus is HTR, our work also touches on layout analysis, segmentation, and named entity recognition. However, we only publish datasets that are general enough to be reused in other contexts.

We hope this material will be valuable to other archives, digital humanities researchers, and anyone working to unlock handwritten historical sources through machine learning.

We welcome collaborations

We are open to collaboration — especially where shared interests, capacity, and funding allow. By sharing our work, we also hope to encourage knowledge exchange and broader participation in developing AI tools for historical materials.

📌 Learn more

Visit the Danish National Archives (Rigsarkivet) for more about our collections and services.

📬 Contact For questions, collaborations, or dataset inquiries, reach us at: 📧 Email

models 0

None public yet