Llama-3-Korean-8B / README.md
sh2orc's picture
READMD.md modify
3df13c8
|
raw
history blame
1.6 kB
metadata
license: llama3

Query


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

BASE_MODEL = "sh2orc/llama-3-korean-8b"

model = AutoModelForCausalLM.from_pretrained(BASE_MODEL,device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

instruction = "ν•œκ°•μ—λŠ” λŒ€κ΅κ°€ λͺ‡ 개 μžˆμ–΄?"

pipe = pipeline("text-generation", 
                model=model, 
                tokenizer=tokenizer, 
                max_new_tokens=1024)

messages = [
    {"role": "user", "content": instruction},
]

prompt = pipe.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

outputs = pipe(
    prompt,
    do_sample=True,
    temperature=0.8,
    top_k=10,
    top_p=0.9,
    add_special_tokens=True,
    eos_token_id = [ 
        pipe.tokenizer.eos_token_id,
        pipe.tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]
)

print(outputs[0]['generated_text'][len(prompt):])

Result

ν•œκ°•μ—λŠ” 총 8개의 닀리(ꡐ)κ°€ μžˆμŠ΅λ‹ˆλ‹€. κ·Έ 쀑 3κ°œλŠ” 뢁μͺ½μœΌλ‘œ ν–₯ν•΄ 있고, λ‚˜λ¨Έμ§€ 5κ°œλŠ” 남μͺ½μœΌλ‘œ ν–₯ν•΄ μžˆμŠ΅λ‹ˆλ‹€.