Llama-3.1-8B-Instruct-text-to-sql-adapter

This is a LoRA Adapter fine-tuned from meta-llama/Llama-3.1-8B-Instruct for text-to-SQL generation tasks.

📋 Model Description

Base Model: meta-llama/Llama-3.1-8B-Instruct
Model Type: LoRA Adapter
Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
Training Dataset: chrisjcc/text-to-sql-spider-dataset
Task: Convert natural language questions into SQL queries
Language: English
License: apache-2.0

🎯 Intended Use

This model is designed to translate natural language questions into SQL queries for database interaction. It works best when provided with:

A database schema (CREATE TABLE statements)
A natural language question about the data

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")
tokenizer = AutoTokenizer.from_pretrained("chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")

# For inference, merge adapter for better performance (optional)
model = model.merge_and_unload()

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    do_sample=False,
)

# Example usage
schema = """
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP
);
"""

question = "Show me all users who registered in the last 7 days"

messages = [
    {
        "role": "system",
        "content": f"You are a text to SQL translator.\n\nSCHEMA:\n{schema}"
    },
    {"role": "user", "content": question}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt)
sql_query = outputs[0]['generated_text'][len(prompt):].strip()

print("Generated SQL:", sql_query)

⚙️ Training Configuration

Model Architecture

LoRA Rank (r): 16
LoRA Alpha: 32
LoRA Dropout: 0.1
Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Max Sequence Length: 2048

Training Hyperparameters

Number of Epochs: 5
Per-Device Batch Size: 1
Gradient Accumulation Steps: 8
Effective Batch Size: 8
Learning Rate: 5e-05
Learning Rate Scheduler: Constant
Optimizer: AdamW (torch fused)
Weight Decay: 0
Warmup Ratio: 0.03
Max Gradient Norm: 1.0
Precision: bfloat16

Training Infrastructure

Hardware: NVIDIA GPU with bfloat16 support
Framework: Transformers + PEFT + TRL
Gradient Checkpointing: Enabled
Flash Attention: Enabled

📊 Training Details

The model was fine-tuned using Supervised Fine-Tuning (SFT) with the following approach:

Dataset Format: Chat template with system/user/assistant roles
System Prompt: Includes database schema for context
User Prompt: Natural language question
Assistant Response: SQL query

Example Training Sample

{
  "messages": [
    {
      "role": "system",
      "content": "You are a text to SQL translator...\n\nSCHEMA:\nCREATE TABLE..."
    },
    {
      "role": "user",
      "content": "Show me all customers from New York"
    },
    {
      "role": "assistant",
      "content": "SELECT * FROM customers WHERE city = 'New York';"
    }
  ]
}

🎓 Model Performance

The model has been trained to generate syntactically correct SQL queries for various database schemas. Performance may vary based on:

Complexity of the database schema
Ambiguity in the natural language question
Similarity to training data

⚠️ Limitations

Schema Knowledge: The model must be provided with the database schema at inference time
SQL Dialect: Primarily trained on standard SQL; may require adjustments for specific database systems (PostgreSQL, MySQL, etc.)
Complex Queries: Performance may degrade on very complex multi-join queries or advanced SQL features
Ambiguity: May struggle with ambiguous natural language questions
Context Length: Limited to 2048 tokens (including schema + question)

🔄 Version History

v1.0: Initial release with 5 epochs of training

📚 Citation

If you use this model in your research or application, please cite:

@misc{chrisjcc_Llama_3.1_8B_Instruct_text_to_sql_adapter,
  author = {Christian Contreras Campana},
  title = {Llama-3.1-8B-Instruct-text-to-sql-adapter: Fine-tuned Text-to-SQL Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter}}
}

📄 License

This model is released under the APACHE-2.0 license. The base model meta-llama/Llama-3.1-8B-Instruct has its own license terms.

🙏 Acknowledgments

Base model: meta-llama/Llama-3.1-8B-Instruct
Training framework: Hugging Face Transformers, PEFT, TRL
Dataset: chrisjcc/text-to-sql-spider-dataset

🤝 Contact

For questions or feedback, please open an issue on the model repository.

Model Type: LoRA adapter weights Training Date: 2025 Model Size: ~8B parameters

Downloads last month: 62

Model tree for chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1200)

this model

chrisjcc
/

Llama-3.1-8B-Instruct-text-to-sql-adapter