π Table Extraction Tool: OCR & Computer Vision for Structured Data
Overview
Table Transformer is an advanced open-source tool that leverages state-of-the-art OCR and computer vision techniques to extract structured tabular data from images. It is ideal for enhancing LLM preprocessing, powering data analysis pipelines, and automating your data extraction tasks.
Features
- π Automatic Table Detection: Effortlessly detect tables in images.
- π OCR-based Document Processing: Extract text with high accuracy.
- π§ Integrated Models: Seamlessly combine OCR and table detection models.
- πΎ Flexible Export Options: Export data as DataFrame, HTML, CSV, and more.
Tool Overview




Open-Source Tools Used
- PaddleOCR: For text extraction.
- Hugging Face Table Detection: For table structure detection.
Installation
Prerequisites
- Python 3.8+
- Conda
Setup
Clone the Repository
Clone the repository to your local machine:
git clone https://github.com/Sudhanshu1304/table-transformer.git cd table-transformer
Create and Activate Conda Environment
Create a new conda environment and activate it:
conda create --name myenv python=3.12.7 conda activate myenv
Install PaddlePaddle
Install PaddlePaddle in the conda environment:
python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
Install PaddleOCR
Install PaddleOCR:
pip install paddleocr
Install Additional Dependencies
Install other required packages:
pip install ultralytics pandas pip install streamlit
Project Structure
project/
βββ src/
β βββ streamlit_app.py # Streamlit application
β βββ table_creator/
β β βββ processing.py # Core processing logic
β βββ models/
β β βββ text.py # table detection and text recognition
β
βββ requirements.txt # Dependencies
βββ README.md # Project documentation
βββ .gitignore # Git ignore configuration
Usage
Run the Streamlit app to interact with the tool:
streamlit run src/streamlit_app.py
Contributions
Contributions are welcome! Please fork the repository and submit a pull request with your improvements or new features.
License
This project is licensed under the MIT License.
Connect with Us
Stay updated and connect for any queries or contributions:
- GitHub: Sudhanshu1304
- LinkedIn: Sudhanshu Pandey
- Medium: @sudhanshu.dpandey
Support
If you find this tool useful, please consider giving it a β on GitHub. Your support is greatly appreciated!
Happy Extracting!