Llama-3-Korean-8B / README.md
sh2orc's picture
READMD.md modify
3df13c8
|
raw
history blame
1.6 kB
---
license: llama3
---
- Foundation Model [Bllossom 8B](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B)
- datasets
- [Koalpaca v1.1a](https://huggingface.co/datasets/beomi/KoAlpaca-v1.1a)
- [jojo0217/korean_safe_conversation](https://huggingface.co/datasets/jojo0217/korean_safe_conversation)
# Query
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
BASE_MODEL = "sh2orc/llama-3-korean-8b"
model = AutoModelForCausalLM.from_pretrained(BASE_MODEL,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'
instruction = "ν•œκ°•μ—λŠ” λŒ€κ΅κ°€ λͺ‡ 개 μžˆμ–΄?"
pipe = pipeline("text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=1024)
messages = [
{"role": "user", "content": instruction},
]
prompt = pipe.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
outputs = pipe(
prompt,
do_sample=True,
temperature=0.8,
top_k=10,
top_p=0.9,
add_special_tokens=True,
eos_token_id = [
pipe.tokenizer.eos_token_id,
pipe.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
)
print(outputs[0]['generated_text'][len(prompt):])
```
# Result
<pre>
ν•œκ°•μ—λŠ” 총 8개의 닀리(ꡐ)κ°€ μžˆμŠ΅λ‹ˆλ‹€. κ·Έ 쀑 3κ°œλŠ” 뢁μͺ½μœΌλ‘œ ν–₯ν•΄ 있고, λ‚˜λ¨Έμ§€ 5κ°œλŠ” 남μͺ½μœΌλ‘œ ν–₯ν•΄ μžˆμŠ΅λ‹ˆλ‹€.
</pre>