Model Card for Model ID

This project aims to create a text scanner that converts paper images into machine-readable formats (e.g., Markdown, JSON). It is the son of Nougat, and thus, grandson of Douat.

The key idea is to combine the bounding box modality with text, achieving a pixel scan behavior that predicts not only the next token but also the next position.

Example Image

The name "Lougat" is a combination of LLama and Nougat. The key idea is nature continues of this paper [LOCR: Location-Guided Transformer for Optical Character Recognition]([2403.02127] LOCR: Location-Guided Transformer for Optical Character Recognition (arxiv.org))

Current Branch: The Flougat model

Other Branch:

  • Florence2 + LLama โ†’ Flougat
  • Sam2 + LLama โ†’ Slougat
  • Nougat + Relative Position Embedding LLama โ†’ Rlougat

Inference and Train

Please see https://github.com/veya2ztn/Lougat

Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.