metadata

base_model: unsloth/qwen2.5-coder-32b-instruct-bnb-4bit
library_name: peft
datasets:
  - 100suping/ko-bird-sql-schema
  - won75/text_to_sql_ko
language:
  - ko
pipeline_tag: text-generation
tags:
  - SQL
  - lora
  - adapter
  - instruction-tuning

100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter

This Repo contains LoRA (Low-Rank Adaptation) Adapter for [unsloth/qwen2.5-coder-32b-instruct-bnb-4bit]

The Adapter was trained for improving model's SQL generation capability in Korean question & multi-db context.

This adapter was created through instruction tuning.

Model Details

Model Description

Base Model: unsloth/Qwen2.5-Coder-32B-Instruct
Task: Instruction Following(Korean)
Language: English (or relevant language)
Training Data: 100suping/ko-bird-sql-schema, won75/text_to_sql_ko
Model type: Causal Language Models.
Language(s) (NLP): Multi-Language

How to Get Started with the Model

To use this LoRA adapter, refer to the following code:

Prompt

GENERAL_QUERY_PREFIX = """당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다.
당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다.

(context)
{context}
"""

GENERATE_QUERY_INSTRUCTIONS = """
주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요.
"""

Preprocess Functions

def get_conversation_data(examples):
    questions = examples['question']
    schemas =examples['schema']
    sql_queries =examples['SQL']
    convos = []
    for question, schema, sql in zip(questions, schemas, sql_queries):
        conv = [
        {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=schema) + GENERATE_QUERY_INSTRUCTIONS},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "```sql\n"+sql+";\n```"}
        ]
        convos.append(conv)
    return {"conversation":convos,}

def formatting_prompts_func(examples):
    convos = examples["conversation"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }

Example input

<|im_start|>system
당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다.
당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다.

(context)
DB: movie_platform
table DDL: CREATE TABLE `movies` ( `movie_id` INTEGER `movie_title` TEXT `movie_release_year` INTEGER `movie_url` TEXT `movie_title_language` TEXT `movie_popularity` INTEGER `movie_image_url` TEXT `director_id` TEXT `director_name` TEXT `director_url` TEXT PRIMARY KEY (movie_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists`(user_id) FOREIGN KEY (list_id) REFERENCES `lists`(list_id) FOREIGN KEY (user_id) REFERENCES `ratings_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (movie_id) REFERENCES `movies`(movie_id) );


주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요.
<|im_end|>
<|im_start|>user
가장 인기 있는 영화는 무엇인가요? 그 영화는 언제 개봉되었고 누가 감독인가요?<|im_end|>
<|im_start|>assistant
```sql
SELECT movie_title, movie_release_year, director_name FROM movies ORDER BY movie_popularity DESC LIMIT 1 ;
```<|im_end|>

Inference

messages = [
        {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=context) + GENERATE_QUERY_INSTRUCTIONS},
        {"role": "user", "content": "user_question: "+ user_question}
    ]


text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=max_new_tokens
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

PEFT 0.13.2