TinyLlama-1.1B-Chat-LoRA-Fused-v1.0 — Natural-Language-to-SQL

TinyLlama-1.1B-Chat-LoRA-Fused-v1.0 is a 1.1 billion parameter model derived from TinyLlama/TinyLlama-1.1B-Chat-v1.0.
Using parameter-efficient LoRA fine-tuning and the new Apple-Silicon-native MLX framework, the model has been specialised to convert plain-English questions into syntactically correct SQL queries for relational databases.
After training, the LoRA adapters were merged (“fused”) into the base weights, so you only need this single checkpoint for inference.

🗝️ Key Facts

Property	Value
Base model	TinyLlama 1.1B Chat v1.0
Task	Natural-Language → SQL generation
Fine-tuning method	Low-Rank Adaptation (LoRA) @ rank = 16
Training framework	MLX 0.8 + PEFT
Hardware	MacBook Pro M4 Pro (20-core GPU)
Checkpoint size	2.1 GB (fp16, fused)
License	Apache 2.0

✨ Intended Use

Interactive data exploration inside BI notebooks or chatbots.
Customer-support analytics — empower non-SQL users to ask free-form questions.
Education & demos showing how LoRA + MLX enables rapid on-device fine-tuning.

The model was trained on synthetic NL-SQL pairs for demo purposes. Do not deploy it in production for mission-critical SQL generation without additional evaluation on your own schema and security review.

💻 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "jero2rome/tinyllama-1.1b-chat-lora-fused-v1.0"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = """\
### Database schema
table orders(id, customer_id, total, created_at)
table customers(id, name, country)

### Question
List total sales per country ordered by total descending."""

inputs = tok(prompt, return_tensors="pt")
sql_out = model.generate(**inputs, max_new_tokens=128)
print(tok.decode(sql_out[0], skip_special_tokens=True))

🏋️‍♂️ Training Details

Data – 10 K synthetic NL/SQL pairs auto-generated from the open-domain schema list, then manually spot-checked for correctness.
Pre-processing – schema + question paired using the Text-to-SQL prompt pattern; SQL statements lower-cased; no anonymisation.
Hyper-parameters
- batch size = 32 (gradient-accum = 4)
- learning-rate = 2 e-4 (cosine schedule)
- epochs = 3
- LoRA rank = 16, α = 32
- fp16 mixed-precision

Total GPU-hours ≈ 5mins on Apple-Silicon.

🌱 Environmental Impact

LoRA fine-tuning on consumer Apple-Silicon is energy-efficient.

🛠️ Limitations & Biases

Trained on a synthetic, limited dataset → may under-perform on real production schemas.
Does not perform schema-linking; you must include the relevant schema in the prompt.
SQL is not guaranteed to be safe; always validate queries before execution.

✍️ Citation

@misc{mohanan2024tinyllama_sql_lora,
  title   = {TinyLlama-1.1B-Chat-LoRA-Fused-v1.0},
  author  = {Jerome Mohanan},
  note    = {Hugging Face repository: https://huggingface.co/jero2rome/tinyllama-1.1b-chat-lora-fused-v1.0},
  year    = {2024}
}

📫 Contact

Questions or feedback? Ping @jero2rome on Hugging Face or email [email protected].

jero2rome
/

tinyllama-1.1b-chat-lora-fused-v1.0