allenai
/

olmOCR-2-7B-1025-FP8

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+library_name: transformers
+---
+<img alt="olmOCR Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmocr/olmocr.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
+# olmOCR-7B-1025-FP8
+Quantized to FP8 Version of [olmOCR-7B-1025](https://huggingface.co/allenai/olmOCR-7B-1025), using llmcompressor.
+This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the
+[olmOCR-mix-1025](https://huggingface.co/datasets/allenai/olmOCR-mix-1025) dataset. It has been additionally
+fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases.
+Quick links:
+- 📃 [Paper](https://olmocr.allenai.org/papers/olmocr.pdf)
+- 🤗 [Dataset](https://huggingface.co/datasets/allenai/olmOCR-mix-1025)
+- 🛠️ [Code](https://github.com/allenai/olmocr)
+- 🎮 [Demo](https://olmocr.allenai.org/)
+The best way to use this model is via the [olmOCR toolkit](https://github.com/allenai/olmocr).
+The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents
+at scale.
+## olmOCR-Bench Scores
+<table>
+  <thead>
+    <tr>
+      <th align="left"><strong>Model</strong></th>
+      <th align="center">ArXiv</th>
+      <th align="center">Old Scans Math</th>
+      <th align="center">Tables</th>
+      <th align="center">Old Scans</th>
+      <th align="center">Headers and Footers</th>
+      <th align="center">Multi column</th>
+      <th align="center">Long tiny text</th>
+      <th align="center">Base</th>
+      <th align="center">Overall</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td align="left">olmOCR pipeline v0.4.0 with olmOCR-7B-1025-FP8</td>
+      <td align="center"><strong>83.0</strong></td>
+      <td align="center"><strong>82.3</strong></td>
+      <td align="center"><strong>77.7</strong></td>
+      <td align="center"><strong>47.7</strong></td>
+      <td align="center">96.1</td>
+      <td align="center"><strong>83.7</strong></td>
+      <td align="center"><strong>84.6</strong></td>
+      <td align="center"><strong>99.8</strong></td>
+      <td align="center"><strong>82.4 ± 1.1</strong></td>
+    </tr>
+  </tbody>
+</table>
+## Usage
+This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels.
+The prompt must then contain the additional metadata from the document, and the easiest way to generate this
+is to use the methods provided by the [olmOCR toolkit](https://github.com/allenai/olmocr).
+## Manual Usage
+If you must run the model as a one-off, please follow the instructions below.
+Note: It is important to keep the prompt and image dimensions exactly as specified, or else performance may drop from the benchmark numbers we report.
+## License and use
+olmOCR is licensed under the Apache 2.0 license.
+olmOCR is intended for research and educational use.
+For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).