🌟 Table Extraction Tool: OCR & Computer Vision for Structured Data

License: MIT Build Status Stars Watchers

Overview

Table Transformer is an advanced open-source tool that leverages state-of-the-art OCR and computer vision techniques to extract structured tabular data from images. It is ideal for enhancing LLM preprocessing, powering data analysis pipelines, and automating your data extraction tasks.

Features

  • πŸ“Š Automatic Table Detection: Effortlessly detect tables in images.
  • πŸ“ OCR-based Document Processing: Extract text with high accuracy.
  • 🧠 Integrated Models: Seamlessly combine OCR and table detection models.
  • πŸ’Ύ Flexible Export Options: Export data as DataFrame, HTML, CSV, and more.

Tool Overview

Image upload Table detection & extraction Table in HTML format Table exported as CSV

Open-Source Tools Used


Installation

Prerequisites

  • Python 3.8+
  • Conda

Setup

  1. Clone the Repository

    Clone the repository to your local machine:

    git clone https://github.com/Sudhanshu1304/table-transformer.git
    cd table-transformer
    
  2. Create and Activate Conda Environment

    Create a new conda environment and activate it:

    conda create --name myenv python=3.12.7
    conda activate myenv
    
  3. Install PaddlePaddle

    Install PaddlePaddle in the conda environment:

    python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
    
  4. Install PaddleOCR

    Install PaddleOCR:

    pip install paddleocr
    
  5. Install Additional Dependencies

    Install other required packages:

    pip install ultralytics pandas
    pip install streamlit
    

Project Structure

project/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ streamlit_app.py       # Streamlit application
β”‚   β”œβ”€β”€ table_creator/
β”‚   β”‚   └── processing.py      # Core processing logic
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── text.py            # table detection and text recognition
β”‚
β”œβ”€β”€ requirements.txt           # Dependencies
β”œβ”€β”€ README.md                  # Project documentation
└── .gitignore                 # Git ignore configuration

Usage

Run the Streamlit app to interact with the tool:

streamlit run src/streamlit_app.py

Contributions

Contributions are welcome! Please fork the repository and submit a pull request with your improvements or new features.

License

This project is licensed under the MIT License.


Connect with Us

Stay updated and connect for any queries or contributions:


Support

If you find this tool useful, please consider giving it a ⭐ on GitHub. Your support is greatly appreciated!

Happy Extracting!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support