Image-Text-to-Text
Transformers
ONNX
Safetensors
English
idefics3
conversational

the coordinate based on what image size? and \n striped

#21
by wuutiing2 - opened

have two questions
1\ the output <loc> tag mean a coordinate? what the coordinate based, a 512 image? I tested but there are some deviations
2\ \n are striped in one <text>. \n is important in OCR, it affects the typesetting, is there a version that keeps \n?

Docling org
  1. The location tags are normalized to 500 as mentioned in the paper.
  2. I don't quite understand that. But if I understand you correctly if something is already on a new line it will be a separate
asnassar changed discussion status to closed

an image for example,
one output are like this

<text><loc_94><loc_134><loc_415><loc_252>Special Event Devoted to International Day of</text>

but i would like it to be

<text><loc_94><loc_134><loc_415><loc_252>Special Event\nDevoted to\nInternational\nDay of</text>

If have a version that keeps \n

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment