How to correctly determine the coordinates for this prompt: "OCR the text in a specific location: <loc_155><loc_233><loc_206><loc_237>"

by Anaudia - opened 11 days ago

11 days ago

I would like to use the model on specific parts of my image, but I am not sure how to transform the boundary boxes I have into the loc parameters used in the prompt.

asnassar

Docling org 11 days ago

Hello, thanks for pointing this out. Perhaps we need to have a helper function somewhere visible. You can find a function that takes in normalized coords or pixel coords in [xmin, ymin, xmax, ymax] at the demo here:
https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo/blob/12df581e7fb68a527eb8e857c6a1caea6da3828c/app.py#L35

asnassar changed discussion status to closed 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment