olmOCR
Collection
olmOCR is a document recognition pipeline for efficiently converting documents into plain text.
olmocr.allenai.org
•
4 items
•
Updated
•
115
This is the official FP8 quantized version of olmOCR-7B-0225-preview for use with the olmOCR pipeline.
Be sure you have olmOCR v0.1.75 or newer and run:
# Download a sample PDF
curl -o olmocr-sample.pdf https://olmocr.allenai.org/papers/olmocr_3pg_sample.pdf
# Convert it to markdown
python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf --model allenai/olmOCR-7B-0225-preview-FP8
olmOCR is licensed under the Apache 2.0 license. olmOCR is intended for research and educational use. For more information, please see our Responsible Use Guidelines.