--- base_model: unsloth/qwen2.5-coder-32b-instruct-bnb-4bit library_name: peft datasets: - 100suping/ko-bird-sql-schema - won75/text_to_sql_ko language: - ko pipeline_tag: text-generation tags: - SQL - lora - adapter - instruction-tuning --- # 100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter This Repo contains **LoRA (Low-Rank Adaptation) Adapter** for [unsloth/qwen2.5-coder-32b-instruct-bnb-4bit] The Adapter was trained for improving model's SQL generation capability in Korean question & multi-db context. This adapter was created through **instruction tuning**. ## Model Details ### Model Description - **Base Model:** unsloth/Qwen2.5-Coder-32B-Instruct - **Task:** Instruction Following(Korean) - **Language:** English (or relevant language) - **Training Data:** 100suping/ko-bird-sql-schema, won75/text_to_sql_ko - **Model type:** Causal Language Models. - **Language(s) (NLP):** Multi-Language ## How to Get Started with the Model To use this LoRA adapter, refer to the following code: ### Load Apdater ``` from transformers import BitsAndBytesConfig def get_bnb_config(bit=8): if bit == 8: return BitsAndBytesConfig(load_in_8bit=True) else: print(f"You put {bit} bit in argument.\nWhatever the number you put in, if it is not 8 then 4bit config would be returned.") return BitsAndBytesConfig(load_in_4bit=True) ``` ``` from unsloth import FastLanguageModel model_name = "unsloth/Qwen2.5-Coder-32B-Instruct" adapter_revision = "checkpoint-200" # checkpoint-100 ~ 350, main(which is checkpoint-384) bnb_config = get_bnb_config(bit=8) model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, dtype=None, quantization_config=bnb_config, ) model.load_adapter("100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter", revision=adapter_revision) ``` ### Prompt ``` GENERAL_QUERY_PREFIX = """당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다. 당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다. (context) {context} """ GENERATE_QUERY_INSTRUCTIONS = """ 주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요. """ ``` ### Example input ``` <|im_start|>system 당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다. 당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다. (context) DB: movie_platform table DDL: CREATE TABLE `movies` ( `movie_id` INTEGER `movie_title` TEXT `movie_release_year` INTEGER `movie_url` TEXT `movie_title_language` TEXT `movie_popularity` INTEGER `movie_image_url` TEXT `director_id` TEXT `director_name` TEXT `director_url` TEXT PRIMARY KEY (movie_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists`(user_id) FOREIGN KEY (list_id) REFERENCES `lists`(list_id) FOREIGN KEY (user_id) REFERENCES `ratings_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (movie_id) REFERENCES `movies`(movie_id) ); 주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요. <|im_end|> <|im_start|>user 가장 인기 있는 영화는 무엇인가요? 그 영화는 언제 개봉되었고 누가 감독인가요?<|im_end|> <|im_start|>assistant ```sql SELECT movie_title, movie_release_year, director_name FROM movies ORDER BY movie_popularity DESC LIMIT 1 ; ```<|im_end|> ``` ### Inference ``` messages = [ {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=context) + GENERATE_QUERY_INSTRUCTIONS}, {"role": "user", "content": "user_question: "+ user_question} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=max_new_tokens ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## Training Details ### Training Data [More Information Needed] ### Training Procedure ``` ``` ### Preprocess Functions ``` def get_conversation_data(examples): questions = examples['question'] schemas =examples['schema'] sql_queries =examples['SQL'] convos = [] for question, schema, sql in zip(questions, schemas, sql_queries): conv = [ {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=schema) + GENERATE_QUERY_INSTRUCTIONS}, {"role": "user", "content": question}, {"role": "assistant", "content": "```sql\n"+sql+";\n```"} ] convos.append(conv) return {"conversation":convos,} def formatting_prompts_func(examples): convos = examples["conversation"] texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos] return { "text" : texts, } ``` #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed] ### Framework versions - PEFT 0.13.2