Index card datasets for training and evaulating models for conversion of index cards to structured data/metadata
AI & ML interests
🤗 Hugging Face x 🌸 BigScience initiative to create open source community resources for LAMs.
Recent Activity
View all activity
Organization Card
📚 BigLAM: Machine Learning for Libraries, Archives, and Museums
BigLAM is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for Libraries, Archives, and Museums (LAMs).
We aim to:
- 🗃️ Share machine-learning-ready datasets from LAMs via the Hugging Face Hub
- 🤖 Train and release open-source models for LAM-relevant tasks
- 🛠️ Develop tools and approaches tailored to LAM use cases
✨ Background
BigLAM began as a datasets hackathon within the BigScience 🌸 project, a large-scale, open NLP collaboration.
Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
📂 What You'll Find
The BigLAM organization hosts:
- Datasets: image, text, and tabular data from and about libraries, archives, and museums
- Models: fine-tuned for tasks like:
- Art/historical image classification
- Document layout analysis and OCR
- Metadata quality assessment
- Named entity recognition in heritage texts
- Spaces: tools for interactive exploration and demonstration
🧩 Get Involved
We welcome contributions! You can:
- Use our datasets and models
- Join the discussion on GitHub
- Contribute your own tools or data
- Share your work using BigLAM resources
🌍 Why It Matters
Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:
- Supporting inclusive and responsible AI
- Helping institutions experiment with ML for access, discovery, and preservation
- Ensuring that ML systems reflect diverse human knowledge and expression
- Developing tools and methods that work well with the unique formats, values, and needs of LAMs
models
6

biglam/historic-newspaper-illustrations-yolov11
Object Detection
•
Updated
•
10

biglam/medieval-manuscript-yolov11
Object Detection
•
Updated
•
4

biglam/detr-resnet-50_fine_tuned_loc-2023
Object Detection
•
41.6M
•
Updated
•
57
•
2

biglam/detr-resnet-50_fine_tuned_nls_chapbooks
Object Detection
•
41.6M
•
Updated
•
71
•
6

biglam/cultural_heritage_metadata_accuracy
Text Classification
•
0.1B
•
Updated
•
4
•
3

biglam/autotrain-beyond-the-books
Text Classification
•
0.1B
•
Updated
•
3
datasets
38
biglam/doab-metadata-extraction
Viewer
•
Updated
•
8.09k
•
268
•
11
biglam/harvard-library-bibliographic-dataset
Viewer
•
Updated
•
11.1M
•
63
•
2
biglam/rubenstein-manuscript-catalog
Viewer
•
Updated
•
49.7k
•
108
•
2
biglam/bpl-card-catalog
Viewer
•
Updated
•
838k
•
224
•
4
biglam/brill_iconclass
Viewer
•
Updated
•
87.7k
•
73
•
8
biglam/sloane-index-cards
Viewer
•
Updated
•
2.73k
•
45
•
1
biglam/newspaper-navigator
Viewer
•
Updated
•
6.55M
•
102
•
10
biglam/loc_beyond_words
Viewer
•
Updated
•
3.56k
•
48
•
13
biglam/europeana_newspapers
Viewer
•
Updated
•
11.9M
•
215
•
54
biglam/european_art
Viewer
•
Updated
•
15.2k
•
182
•
19