A newer version of the Gradio SDK is available:
5.17.1
title: nlp-to-sql-chat-assistant
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.15.0
app_file: app/app.py
pinned: false
SQL Chat Assistant
Public Testing
https://huggingface.co/spaces/DevashishNagpal/nlp-to-sql-chat-assistant
Note: If the model fails to understand the user query, it generates a fallback query according based on the input.:
SELECT * FROM employees;
SELECT * FROM DEPARTMENTS;
Please note that this project is still under development, and the model may not work as expected for all queries. Feel free to test it out and provide feedback for improvements.
Example queries:
- "Show me all the employees"
- "Show me the employees who are managers"
- "Who is the manager of Marketing department?"
Overview
This project is a Flask-based chat assistant that converts natural language queries into SQL statements using state-of-the-art NLP models. The system leverages Hugging Face transformer models, sentence embedding techniques, and fine-tuning approaches to generate accurate SQL queries for an SQLite database.
The primary goal of this project is to enable users to interact with structured data using conversational language, making database queries accessible to non-technical users.
Approach
1. Pretrained Transformer Models (Hugging Face)
Initially, multiple Hugging Face models were tested to generate SQL queries from natural language inputs. However, most of them produced inconsistent results due to their general training data.
2. Sentence Transformers + Cosine Similarity + Parameter Extraction
To improve query generation, I experimented with an approach that captures the semantic meaning of user queries and maps them to predefined SQL templates using:
- Sentence embeddings: Extracting vector representations of queries.
- Cosine similarity: Matching user queries with predefined SQL structures.
- Regular expression templates: Extracting SQL parameters dynamically to refine query formation.
3. Fine-Tuning T5-Small with ONNX Quantization
To enhance accuracy, I fine-tuned the t5-small model using a custom dataset based on the structure of my SQLite database.
ONNX quantization was applied to reduce the model size and improve deployment efficiency while staying within hosting constraints.
Installation Guide
1. Clone the Repository
git clone https://github.com/DevashishXO/SQLite-Chat-Assistant.git
cd SQLite-Chat-Assistant
2. Install Dependencies
pip install -r requirements.txt
3. Set up the SQLite Database
python data/initialize_db.py
4. Run the Flask App
$env:FLASK_APP="app.main:app"
flask run
Models Explored for the Project
I experimented with multiple models before settling on the fine-tuned t5-small model with ONNX quantization.
Models Explored and Rejected
Model | Reason for Rejection |
---|---|
mrm8488/t5-base-finetuned-wikiSQL | Produced incorrect table references due to its focus on WikiSQL datasets. |
tscholak/cxmefzzi | Large model size, requiring high computational resources for inference. |
HridaAI/Hrida-T2SQL-3B-V0.2 | Optimized for Spider dataset, failing on custom schemas. |
cssupport/t5-small-awesome-text-to-sql | Limited accuracy without schema-specific fine-tuning. |
hasibzunair/t5-small-spider-sql | Required significant schema customization. |
hkunlp/text2sql-t5-small | Generated incomplete queries. |
szarnyasg/transformer-text2sql | Poor generalization on varied SQL queries. |
jimypbr/gpt2-finetuned-wikitext2 | GPT-2 was not suited for structured SQL generation. |
Future Improvements
- Enhance dataset for fine-tuning.
- Implement caching for faster response times.
- Deploy the model using Hugging Face Spaces or an optimized cloud server.
Author
Devashish Nagpal
GitHub: github.com/DevashishXO
LinkedIn: linkedin.com/in/devashishnagpal
License
This project is open-source under the MIT License.