How to Use olmOCR GGUF Model with Ollama?

#2
by koala8104 - opened

Hi,

I've downloaded the olmOCR GGUF model and added it to Ollama (running on localhost:11434), but I'm struggling to get it working properly.

Could someone share:

  1. The correct prompt format for olmOCR with Ollama
  2. How to convert PDFs to images and send them to the model
  3. A simple code example showing how to use it

I've read the GitHub repo but still haven't managed to make it work.

Thanks!

I have the same question,is there someone can share how to use olmOCR on ollama?

I also could not make it to OCR images. but for some reasons it did not work for me even as LLM. (it returned random text what indicates a wrong prompt structure)
to fix the chat behavior I did ollama create olmocr -f Modelfile
the file I used

FROM olmOCR-7B-0225-preview-Q5_K_M.gguf
TEMPLATE """{{- if .Messages }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
<|im_start|>{{ .Role }}
{{ .Content }}
{{- if $last }}
{{- if (ne .Role "assistant") }}<|im_end|>
<|im_start|>assistant
{{ end }}
{{- else }}<|im_end|>
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""

SYSTEM You are a helpful assistant.
PARAMETER temperature 0.1

but it is still saying that it does not see images.
I figure our vision models from ollama repos uses a projector model as second GGUF.
I tried used projector GGUF from qwen vl 7b but ollama cli said "Error: invalid file magic".

could you release the projector separately, or deploy your model into ollama, or provide a better solution?
thanks for the great ocr model.

does someone find a solution?

This would be immensely useful if it could nativly work with ollama and open web ui for testing.

Hey guys, currently olmOCR toolkit uses SGLang which doesn't support gguf models. If you're using transformers to load the model, try using different dtype (unlikely to work because of Qwen2VLForConditionalGeneration doesn't allow int8 or float8_*'s).

FROM /root/.ollama/olmOCR-7B-0225-preview-Q5_0.gguf

TEMPLATE "<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"

PARAMETER stop <|im_start|>
PARAMETER stop <|im_end|>

SYSTEM """
You are olmOCR expert to extract content from documents to markdown format"""

{"id": "2741730785ca425ede1db569970114f7c843988a", "text": "I'm sorry, but I can't assist with that.", "source": "olmocr", "added": "2025-06-11", "created": "2025-06-11", "metadata": {"Source-File": "/local_files/\u6210\u90fd\u94f6\u884c2024\u5e74\u5ea6\u5e74\u62a5252.pdf", "olmocr-version": "0.1.71", "pdf-total-pages": 1, "total-input-tokens": 0, "total-output-tokens": 8, "total-fallback-pages": 0}, "attributes": {"pdf_page_numbers": [[0, 40, 1]]}}

Ollama gguf doesn't work either. Anything to improve? Or what is original olmOCR Modelfile?

Hey @willshanghai , gguf is not yet supported. But, please visit the github repo and change branch to jake/vllm_perf where you can run vllm instead of sglang and pass dtype inside pipeline.py file for quants.

Sign up or log in to comment