jakep-allenai commited on
Commit
8b78150
·
verified ·
1 Parent(s): 550f2c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-VL-7B-Instruct
7
+ library_name: transformers
8
+ ---
9
+
10
+ <img alt="olmOCR Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmocr/olmocr.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
11
+
12
+ # olmOCR-7B-1025-FP8
13
+
14
+ Quantized to FP8 Version of [olmOCR-7B-1025](https://huggingface.co/allenai/olmOCR-7B-1025), using llmcompressor.
15
+
16
+ This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the
17
+ [olmOCR-mix-1025](https://huggingface.co/datasets/allenai/olmOCR-mix-1025) dataset. It has been additionally
18
+ fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases.
19
+
20
+ Quick links:
21
+ - 📃 [Paper](https://olmocr.allenai.org/papers/olmocr.pdf)
22
+ - 🤗 [Dataset](https://huggingface.co/datasets/allenai/olmOCR-mix-1025)
23
+ - 🛠️ [Code](https://github.com/allenai/olmocr)
24
+ - 🎮 [Demo](https://olmocr.allenai.org/)
25
+
26
+ The best way to use this model is via the [olmOCR toolkit](https://github.com/allenai/olmocr).
27
+ The toolkit comes with an efficient inference setup via VLLM that can handle millions of documents
28
+ at scale.
29
+
30
+
31
+ ## olmOCR-Bench Scores
32
+
33
+ <table>
34
+ <thead>
35
+ <tr>
36
+ <th align="left"><strong>Model</strong></th>
37
+ <th align="center">ArXiv</th>
38
+ <th align="center">Old Scans Math</th>
39
+ <th align="center">Tables</th>
40
+ <th align="center">Old Scans</th>
41
+ <th align="center">Headers and Footers</th>
42
+ <th align="center">Multi column</th>
43
+ <th align="center">Long tiny text</th>
44
+ <th align="center">Base</th>
45
+ <th align="center">Overall</th>
46
+ </tr>
47
+ </thead>
48
+ <tbody>
49
+ <tr>
50
+ <td align="left">olmOCR pipeline v0.4.0 with olmOCR-7B-1025-FP8</td>
51
+ <td align="center"><strong>83.0</strong></td>
52
+ <td align="center"><strong>82.3</strong></td>
53
+ <td align="center"><strong>77.7</strong></td>
54
+ <td align="center"><strong>47.7</strong></td>
55
+ <td align="center">96.1</td>
56
+ <td align="center"><strong>83.7</strong></td>
57
+ <td align="center"><strong>84.6</strong></td>
58
+ <td align="center"><strong>99.8</strong></td>
59
+ <td align="center"><strong>82.4 ± 1.1</strong></td>
60
+ </tr>
61
+ </tbody>
62
+ </table>
63
+
64
+
65
+ ## Usage
66
+
67
+ This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels.
68
+
69
+ The prompt must then contain the additional metadata from the document, and the easiest way to generate this
70
+ is to use the methods provided by the [olmOCR toolkit](https://github.com/allenai/olmocr).
71
+
72
+
73
+ ## Manual Usage
74
+
75
+ If you must run the model as a one-off, please follow the instructions below.
76
+
77
+ Note: It is important to keep the prompt and image dimensions exactly as specified, or else performance may drop from the benchmark numbers we report.
78
+
79
+
80
+ ## License and use
81
+
82
+ olmOCR is licensed under the Apache 2.0 license.
83
+ olmOCR is intended for research and educational use.
84
+ For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).