---
base_model: unsloth/qwen2.5-coder-32b-instruct-bnb-4bit
library_name: peft
datasets:
- 100suping/ko-bird-sql-schema
- won75/text_to_sql_ko
language:
- ko
pipeline_tag: text-generation
tags:
- SQL
- lora
- adapter
- instruction-tuning
---

# 100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter

<!-- Provide a quick summary of what the model is/does. -->
This Repo contains **LoRA (Low-Rank Adaptation) Adapter** for [unsloth/qwen2.5-coder-32b-instruct-bnb-4bit]

The Adapter was trained for improving model's SQL generation capability in Korean question & multi-db context. 

This adapter was created through **instruction tuning**.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Base Model:** unsloth/Qwen2.5-Coder-32B-Instruct
- **Task:** Instruction Following(Korean)
- **Language:** English (or relevant language)
- **Training Data:** 100suping/ko-bird-sql-schema, won75/text_to_sql_ko
- **Model type:** Causal Language Models.
- **Language(s) (NLP):** Multi-Language

## How to Get Started with the Model

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
To use this LoRA adapter, refer to the following code:

### Load Apdater

```
from transformers import BitsAndBytesConfig

def get_bnb_config(bit=8):
    if bit == 8:
        return BitsAndBytesConfig(load_in_8bit=True)
    else:
        print(f"You put {bit} bit in argument.\nWhatever the number you put in, if it is not 8 then 4bit config would be returned.")
        return BitsAndBytesConfig(load_in_4bit=True)
```

```
from unsloth import FastLanguageModel

model_name = "unsloth/Qwen2.5-Coder-32B-Instruct"
adapter_revision = "checkpoint-200" # checkpoint-100 ~ 350, main(which is checkpoint-384)

bnb_config = get_bnb_config(bit=8)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    dtype=None,
    quantization_config=bnb_config,
)
model.load_adapter("100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter", revision=adapter_revision)
```

### Prompt

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

```
GENERAL_QUERY_PREFIX = """당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다.
당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다.

(context)
{context}
"""

GENERATE_QUERY_INSTRUCTIONS = """
주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요.
"""
```

### Example input

```
<|im_start|>system
당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다.
당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다.

(context)
DB: movie_platform
table DDL: CREATE TABLE `movies` ( `movie_id` INTEGER `movie_title` TEXT `movie_release_year` INTEGER `movie_url` TEXT `movie_title_language` TEXT `movie_popularity` INTEGER `movie_image_url` TEXT `director_id` TEXT `director_name` TEXT `director_url` TEXT PRIMARY KEY (movie_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists`(user_id) FOREIGN KEY (list_id) REFERENCES `lists`(list_id) FOREIGN KEY (user_id) REFERENCES `ratings_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (movie_id) REFERENCES `movies`(movie_id) );


주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요.
<|im_end|>
<|im_start|>user
가장 인기 있는 영화는 무엇인가요? 그 영화는 언제 개봉되었고 누가 감독인가요?<|im_end|>
<|im_start|>assistant
```sql
SELECT movie_title, movie_release_year, director_name FROM movies ORDER BY movie_popularity DESC LIMIT 1 ;
```<|im_end|>
```


### Inference

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

```
messages = [
        {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=context) + GENERATE_QUERY_INSTRUCTIONS},
        {"role": "user", "content": "user_question: "+ user_question}
    ]


text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=max_new_tokens
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

```
```

### Preprocess Functions

```
def get_conversation_data(examples):
    questions = examples['question']
    schemas =examples['schema']
    sql_queries =examples['SQL']
    convos = []
    for question, schema, sql in zip(questions, schemas, sql_queries):
        conv = [
        {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=schema) + GENERATE_QUERY_INSTRUCTIONS},
        {"role": "user", "content": question},
        {"role": "assistant", "content": "```sql\n"+sql+";\n```"}
        ]
        convos.append(conv)
    return {"conversation":convos,}

def formatting_prompts_func(examples):
    convos = examples["conversation"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
```

#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]
### Framework versions

- PEFT 0.13.2