Image-Text-to-Text
Safetensors
qwen2_5_vl
historical
conversational

CHURRO Logo

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Model Dataset Paper GitHub Stars

Handwritten and printed text recognition across 22 centuries and 46 language clusters, including historical and dead languages.

Cost vs Performance comparison showing CHURRO's accuracy advantage at significantly lower cost
Cost vs. accuracy: CHURRO (3B) achieves higher accuracy than much larger commercial and open-weight VLMs while being substantially cheaper.

CHURRO is a 3B-parameter open-weight vision-language model (VLM) for historical document transcription. It is trained on CHURRO-DS, a curated dataset of ~100K pages from 155 historical collections spanning 22 centuries and 46 language clusters. On the CHURRO-DS test set, CHURRO delivers 15.5× lower cost than Gemini 2.5 Pro while exceeding its accuracy.

For more details and code see https://github.com/stanford-oval/Churro.

Downloads last month
525
Safetensors
Model size
3.75B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stanford-oval/churro-3B

Finetuned
(500)
this model
Quantizations
1 model

Dataset used to train stanford-oval/churro-3B

Collection including stanford-oval/churro-3B