Shadowed Checkboxes and Lines as Checks

#5
by ep5000 - opened

Love the model!

I noticed that checkboxes with a line drawn through them are not considered as checked. Please see the following example:

Prompt:
Extract the checkboxes in the document and indicate if they are checked or not

Checkboxes:
image.png

Results:
The results all came back as ☐

Is there a way to handle this?

Nanonets org

We have not trained just on the boxes, but if you have some text beside the boxes (which is usually the case for a document), it should work. I added some text beside the image you have shared and it worked just fine.
Output
image.png

Input

Screenshot 2025-06-15 at 9.08.53β€―PM.png

I tried a similar example but see the following results (identifies false positives):

Prompt (taken from transformers example):

prompt.jpg

Image:
cerveau1.jpg

Results:

β˜‘ Option 1 27383
☐ Option 2 27383
β˜‘ Option 3 27383
☐ Option 4 27383
☐ Option 5 27383
☐ Option 6 27383
☐ Option 7 27383

Adding two rows of checkboxes further seems to offset the results:

Image:

cerveau1.jpg

Results:

β˜‘ Option 1 27383 β˜‘ Option 8 27383
☐ Option 2 27383 β˜‘ Option 9 27383
β˜‘ Option 3 27383 β˜‘ Option 10 27383
☐ Option 4 27383 β˜‘ Option 11 27383
β˜‘ Option 5 27383 β˜‘ Option 12 27383
☐ Option 6 27383 β˜‘ Option 13 27383
☐ Option 7 27383 β˜‘ Other:
Nanonets org

Can you check the code in the official space demo? I am resizing the image to 2024 there, which gives the best performance and seems to be working on the files you have shared. You can try removing parts of the prompts. For example, for structured tables, you can remove the part of the prompt where we ask for HTML tables. Be careful when changing the prompt (especially when adding new text, removing some features should be okay), though. Also, may I know which kind of documents you want to process, since there are dummy images?

image.png

Currently I'm using the transformers code example. I'll try the Huggingface demo space and report back here.

The types of documents being processed are medical patient referrals such as the following: https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/3035179841/original/vM87t_NQ0Ix_LRhfgzb3MN2E0hJpBeFNIQ.png?1521052676

These medical referrals typically have a number of variable format checkboxes that are often filled in by hand (i.e. line drawn through the checkbox(es)).

Using the Huggingface demo I received the following results (all deemed unchecked):

Input Image:

image.png

Result:
Page 1 of 1

Option 1 27383 Option 8 27383
Option 2 27383 Option 9 27383
Option 3 27383 Option 10 27383
Option 4 27383 Option 11 27383
Option 5 27383 Option 12 27383
Option 6 27383 Option 13 27383
Option 7 27383 Other:

Sign up or log in to comment