Papers
arxiv:2409.01704

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Published on Sep 3
ยท Submitted by HaoranWei on Sep 4
#1 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors. Furthermore, we also adapt dynamic resolution and multi-page OCR technologies to GOT for better practicality. In experiments, we provide sufficient results to prove the superiority of our model.

Community

Paper author Paper submitter

OCR-2.0 era is coming.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

IMG_3672.jpeg

ยท

ไป€ไนˆๅ†…ๅฎน

Wow congrats!

The outputs is in Latex right ? Are there alternatives options?

Best.

Looks interesting

what all languages does it support

You've linked the wrong account. The Lingyu Kong you're currently linked to, which is me, was not involved in this paper's work... But anyway it is an interesting work.

Sign up or log in to comment

Models citing this paper 13

Browse 13 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.01704 in a dataset README.md to link it from this page.

Spaces citing this paper 55

Collections including this paper 35