Llama-3.1-8B-Instruct-text-to-sql-adapter

This is a LoRA Adapter fine-tuned from meta-llama/Llama-3.1-8B-Instruct for text-to-SQL generation tasks.

πŸ“‹ Model Description

  • Base Model: meta-llama/Llama-3.1-8B-Instruct
  • Model Type: LoRA Adapter
  • Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
  • Training Dataset: chrisjcc/text-to-sql-spider-dataset
  • Task: Convert natural language questions into SQL queries
  • Language: English
  • License: apache-2.0

🎯 Intended Use

This model is designed to translate natural language questions into SQL queries for database interaction. It works best when provided with:

  1. A database schema (CREATE TABLE statements)
  2. A natural language question about the data

πŸš€ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")
tokenizer = AutoTokenizer.from_pretrained("chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")

# For inference, merge adapter for better performance (optional)
model = model.merge_and_unload()

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    do_sample=False,
)

# Example usage
schema = """
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP
);
"""

question = "Show me all users who registered in the last 7 days"

messages = [
    {
        "role": "system",
        "content": f"You are a text to SQL translator.\n\nSCHEMA:\n{schema}"
    },
    {"role": "user", "content": question}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt)
sql_query = outputs[0]['generated_text'][len(prompt):].strip()

print("Generated SQL:", sql_query)

βš™οΈ Training Configuration

Model Architecture

  • LoRA Rank (r): 16
  • LoRA Alpha: 32
  • LoRA Dropout: 0.1
  • Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • Max Sequence Length: 2048

Training Hyperparameters

  • Number of Epochs: 5
  • Per-Device Batch Size: 1
  • Gradient Accumulation Steps: 8
  • Effective Batch Size: 8
  • Learning Rate: 5e-05
  • Learning Rate Scheduler: Constant
  • Optimizer: AdamW (torch fused)
  • Weight Decay: 0
  • Warmup Ratio: 0.03
  • Max Gradient Norm: 1.0
  • Precision: bfloat16

Training Infrastructure

  • Hardware: NVIDIA GPU with bfloat16 support
  • Framework: Transformers + PEFT + TRL
  • Gradient Checkpointing: Enabled
  • Flash Attention: Enabled

πŸ“Š Training Details

The model was fine-tuned using Supervised Fine-Tuning (SFT) with the following approach:

  1. Dataset Format: Chat template with system/user/assistant roles
  2. System Prompt: Includes database schema for context
  3. User Prompt: Natural language question
  4. Assistant Response: SQL query

Example Training Sample

{
  "messages": [
    {
      "role": "system",
      "content": "You are a text to SQL translator...\n\nSCHEMA:\nCREATE TABLE..."
    },
    {
      "role": "user",
      "content": "Show me all customers from New York"
    },
    {
      "role": "assistant",
      "content": "SELECT * FROM customers WHERE city = 'New York';"
    }
  ]
}

πŸŽ“ Model Performance

The model has been trained to generate syntactically correct SQL queries for various database schemas. Performance may vary based on:

  • Complexity of the database schema
  • Ambiguity in the natural language question
  • Similarity to training data

⚠️ Limitations

  • Schema Knowledge: The model must be provided with the database schema at inference time
  • SQL Dialect: Primarily trained on standard SQL; may require adjustments for specific database systems (PostgreSQL, MySQL, etc.)
  • Complex Queries: Performance may degrade on very complex multi-join queries or advanced SQL features
  • Ambiguity: May struggle with ambiguous natural language questions
  • Context Length: Limited to 2048 tokens (including schema + question)

πŸ”„ Version History

  • v1.0: Initial release with 5 epochs of training

πŸ“š Citation

If you use this model in your research or application, please cite:

@misc{chrisjcc_Llama_3.1_8B_Instruct_text_to_sql_adapter,
  author = {Christian Contreras Campana},
  title = {Llama-3.1-8B-Instruct-text-to-sql-adapter: Fine-tuned Text-to-SQL Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter}}
}

πŸ“„ License

This model is released under the APACHE-2.0 license. The base model meta-llama/Llama-3.1-8B-Instruct has its own license terms.

πŸ™ Acknowledgments

🀝 Contact

For questions or feedback, please open an issue on the model repository.


Model Type: LoRA adapter weights Training Date: 2025 Model Size: ~8B parameters

Downloads last month
62
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter

Adapter
(1200)
this model

Dataset used to train chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter