allenai/olmOCR-7B-0225-preview · Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 11588 `inputs` tokens and 2400 `max_new

Feb 28

•

I'm unable to process this image when calling my hosted endpoint. I get the error...

openai.APIError: Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 11588 `inputs` tokens and 2400 `max_new_tokens`

This is the image https://m.media-amazon.com/images/I/81xwfM+g1VL._AC_SL1500_.jpg
I've tested at https://olmocr.allenai.org/ and it works amazingly well.
I assume the token count comes from the image getting converted to base64 behind the scenes?

How do I remove the limitation? Thanks!

gannu1908

Feb 28

I had similar issues with other OCR Models, lmk if you find a solution. I think it's something related to the training params of the model (Not Sure!!)

jakep-allenai

Ai2 org Feb 28

Can you share your code please? Basically, you likely have a document with a very long "document-anchoring" prompt (see tech report). What we do in the web demo and pipeline is we automatically shrink that down if it gets too long:

ex. anchor_text = get_anchor_text("./paper.pdf", 1, pdf_engine="pdfreport", target_length=4000)
set a lower value for target_length, try 1000 characters.

baileyk

Ai2 org 12 days ago

Hi! Thanks for the inquiry. We’re currently working on closing out old tickets, so we’re closing this out for now, but if you’d still like an answer, please re-open and we will get back to you!

baileyk changed discussion status to closed 12 days ago