How to correctly determine the coordinates for this prompt: "OCR the text in a specific location: <loc_155><loc_233><loc_206><loc_237>"
#6
by
Anaudia
- opened
I would like to use the model on specific parts of my image, but I am not sure how to transform the boundary boxes I have into the loc parameters used in the prompt.
Hello, thanks for pointing this out. Perhaps we need to have a helper function somewhere visible. You can find a function that takes in normalized coords or pixel coords in [xmin, ymin, xmax, ymax] at the demo here:
https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo/blob/12df581e7fb68a527eb8e857c6a1caea6da3828c/app.py#L35
asnassar
changed discussion status to
closed