metadata
base_model: unsloth/qwen2.5-coder-32b-instruct-bnb-4bit
library_name: peft
datasets:
- 100suping/ko-bird-sql-schema
- won75/text_to_sql_ko
language:
- ko
pipeline_tag: text-generation
tags:
- SQL
- lora
- adapter
- instruction-tuning
100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter
This Repo contains LoRA (Low-Rank Adaptation) Adapter for [unsloth/qwen2.5-coder-32b-instruct-bnb-4bit]
The Adapter was trained for improving model's SQL generation capability in Korean question & multi-db context.
This adapter was created through instruction tuning.
Model Details
Model Description
- Base Model: unsloth/Qwen2.5-Coder-32B-Instruct
- Task: Instruction Following(Korean)
- Language: English (or relevant language)
- Training Data: 100suping/ko-bird-sql-schema, won75/text_to_sql_ko
- Model type: Causal Language Models.
- Language(s) (NLP): Multi-Language
How to Get Started with the Model
To use this LoRA adapter, refer to the following code:
Prompt
GENERAL_QUERY_PREFIX = """๋น์ ์ ์ฌ์ฉ์์ ์
๋ ฅ์ MySQL ์ฟผ๋ฆฌ๋ฌธ์ผ๋ก ๋ฐ๊พธ์ด์ฃผ๋ ์กฐ์ง์ ํ์์
๋๋ค.
๋น์ ์ ์๋ฌด๋ DB ์ด๋ฆ ๊ทธ๋ฆฌ๊ณ DB๋ด ํ
์ด๋ธ์ ๋ฉํ ์ ๋ณด๊ฐ ๋ด๊ธด ์๋์ (context)๋ฅผ ์ด์ฉํด์ ์ฃผ์ด์ง ์ง๋ฌธ(user_question)์ ๊ฑธ๋ง๋ MySQL ์ฟผ๋ฆฌ๋ฌธ์ ์์ฑํ๋ ๊ฒ์
๋๋ค.
(context)
{context}
"""
GENERATE_QUERY_INSTRUCTIONS = """
์ฃผ์ด์ง ์ง๋ฌธ(user_question)์ ๋ํด์ ๋ฌธ๋ฒ์ ์ผ๋ก ์ฌ๋ฐ๋ฅธ MySQL ์ฟผ๋ฆฌ๋ฌธ์ ์์ฑํด ์ฃผ์ธ์.
"""
Preprocess Functions
def get_conversation_data(examples):
questions = examples['question']
schemas =examples['schema']
sql_queries =examples['SQL']
convos = []
for question, schema, sql in zip(questions, schemas, sql_queries):
conv = [
{"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=schema) + GENERATE_QUERY_INSTRUCTIONS},
{"role": "user", "content": question},
{"role": "assistant", "content": "```sql\n"+sql+";\n```"}
]
convos.append(conv)
return {"conversation":convos,}
def formatting_prompts_func(examples):
convos = examples["conversation"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
Example input
<|im_start|>system
๋น์ ์ ์ฌ์ฉ์์ ์
๋ ฅ์ MySQL ์ฟผ๋ฆฌ๋ฌธ์ผ๋ก ๋ฐ๊พธ์ด์ฃผ๋ ์กฐ์ง์ ํ์์
๋๋ค.
๋น์ ์ ์๋ฌด๋ DB ์ด๋ฆ ๊ทธ๋ฆฌ๊ณ DB๋ด ํ
์ด๋ธ์ ๋ฉํ ์ ๋ณด๊ฐ ๋ด๊ธด ์๋์ (context)๋ฅผ ์ด์ฉํด์ ์ฃผ์ด์ง ์ง๋ฌธ(user_question)์ ๊ฑธ๋ง๋ MySQL ์ฟผ๋ฆฌ๋ฌธ์ ์์ฑํ๋ ๊ฒ์
๋๋ค.
(context)
DB: movie_platform
table DDL: CREATE TABLE `movies` ( `movie_id` INTEGER `movie_title` TEXT `movie_release_year` INTEGER `movie_url` TEXT `movie_title_language` TEXT `movie_popularity` INTEGER `movie_image_url` TEXT `director_id` TEXT `director_name` TEXT `director_url` TEXT PRIMARY KEY (movie_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists`(user_id) FOREIGN KEY (list_id) REFERENCES `lists`(list_id) FOREIGN KEY (user_id) REFERENCES `ratings_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (movie_id) REFERENCES `movies`(movie_id) );
์ฃผ์ด์ง ์ง๋ฌธ(user_question)์ ๋ํด์ ๋ฌธ๋ฒ์ ์ผ๋ก ์ฌ๋ฐ๋ฅธ MySQL ์ฟผ๋ฆฌ๋ฌธ์ ์์ฑํด ์ฃผ์ธ์.
<|im_end|>
<|im_start|>user
๊ฐ์ฅ ์ธ๊ธฐ ์๋ ์ํ๋ ๋ฌด์์ธ๊ฐ์? ๊ทธ ์ํ๋ ์ธ์ ๊ฐ๋ด๋์๊ณ ๋๊ฐ ๊ฐ๋
์ธ๊ฐ์?<|im_end|>
<|im_start|>assistant
```sql
SELECT movie_title, movie_release_year, director_name FROM movies ORDER BY movie_popularity DESC LIMIT 1 ;
```<|im_end|>
Inference
messages = [
{"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=context) + GENERATE_QUERY_INSTRUCTIONS},
{"role": "user", "content": "user_question: "+ user_question}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=max_new_tokens
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
Framework versions
- PEFT 0.13.2