License
First of all thank you guys for your contribution to the open source community . This model is awesome and a great tool to unlock more training data stuck in pdfs. My collaborators and I are using it to extract text from out of copyright books.
In another thread I asked jina about using the same base model for their embedding model. They seem to have indicated that they got permission to build their model. Maybe yall could reach out and get the license of your model back to apache 2
https://huggingface.co/jinaai/jina-embeddings-v4/discussions/44
what was the mail address?
Без цензуры, на любые темы.
I'm trying to extract structured data from organizational charts in a JSON hierarchy format (i.e., mapping who reports to whom), but I haven't found any datasets of org charts. I tried using OCR tools, but they only return names or text without any structural context. Can OCR be used for this, or would a vision model like layout detection work better? Even if layout detection gives bounding boxes for names, how can I determine the hierarchy—like who works under whom—based on visual elements like arrows, lines, or positioning? Has anyone built a pipeline or model for this type of hierarchical extraction?