nanonets/Nanonets-OCR-s

Jul 1

First of all thank you guys for your contribution to the open source community . This model is awesome and a great tool to unlock more training data stuck in pdfs. My collaborators and I are using it to extract text from out of copyright books.

In another thread I asked jina about using the same base model for their embedding model. They seem to have indicated that they got permission to build their model. Maybe yall could reach out and get the license of your model back to apache 2
https://huggingface.co/jinaai/jina-embeddings-v4/discussions/44

Souvik3333

Nanonets org Jul 1

Thanks @amazingvince . I have mailed them, let's see if we get the permission.

amazingvince changed discussion title from Lisense to License Jul 1

okayatul

Jul 7

what was the mail address?

artartarta

Jul 8

Без цензуры, на любые темы.

Souvik3333

Nanonets org Jul 8

@okayatul [email protected] It's mentioned in their GitHub.

okayatul

Jul 10

I'm trying to extract structured data from organizational charts in a JSON hierarchy format (i.e., mapping who reports to whom), but I haven't found any datasets of org charts. I tried using OCR tools, but they only return names or text without any structural context. Can OCR be used for this, or would a vision model like layout detection work better? Even if layout detection gives bounding boxes for names, how can I determine the hierarchy—like who works under whom—based on visual elements like arrows, lines, or positioning? Has anyone built a pipeline or model for this type of hierarchical extraction?