arxiv:2409.01704

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Published on Sep 3

· Submitted by

HaoranWei on Sep 4

#1 Paper of the day

Upvote

Authors:

Haoran Wei ,

Chenglong Liu ,

Lingyu Kong ,

Yuang Peng ,

Abstract

Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors. Furthermore, we also adapt dynamic resolution and multi-page OCR technologies to GOT for better practicality. In experiments, we provide sufficient results to prove the superiority of our model.

View arXiv page View PDF Add to collection

Community

HaoranWei

Paper author Paper submitter Sep 5

OCR-2.0 era is coming.

librarian-bot

Sep 5

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

maybekatz

Sep 11

maybekatz

Sep 11

什么内容

yleo

Sep 12

Wow congrats!

The outputs is in Latex right ? Are there alternatives options?

Best.

HimankJ

Sep 13

•

edited Sep 13

Looks interesting

lukiod

Sep 24

what all languages does it support

LingyuKong

Sep 24

You've linked the wrong account. The Lingyu Kong you're currently linked to, which is me, was not involved in this paper's work... But anyway it is an interesting work.