mjnam's picture
Update README.md
d24def8 verified
|
raw
history blame
12.1 kB
metadata
license: mit
language:
  - ko
  - en
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Llama-70B
library_name: transformers

DeepSeek-llama3.3-Bllossom

DeepSeek-Bllossom Series๋Š” ๊ธฐ์กด DeepSeek-R1-Distill Series ๋ชจ๋ธ์˜ language mixing, ๋‹ค๊ตญ์–ด ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

DeepSeek-llama3.3-Bllossom-70B๋Š” DeepSeek-R1-distill-Llama-70B ๋ชจ๋ธ์„ ๋ฒ ์ด์Šค๋กœ ๊ตฌ์ถ•๋œ ๋ชจ๋ธ๋กœ, ํ•œ๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ์˜ ์ถ”๋ก  ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ชฉํ‘œ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋ชจ๋ธ์€ UNIVA์™€ BllossomํŒ€์ด ํ•ฉ์ž‘์œผ๋กœ ์ œ์ž‘ํ•œ ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Model Base Model Download
DeepSeek-qwen-Bllossom-1.5B DeepSeek-R1-Distill-Qwen-1.5B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-qwen-Bllossom-7B DeepSeek-R1-Distill-Qwen-7B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-llama3.1-Bllossom-8B DeepSeek-R1-Distill-Llama-8B ๐Ÿค— HuggingFace
DeepSeek-qwen-Bllossom-14B DeepSeek-R1-Distill-Qwen-14B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-qwen-Bllossom-32B DeepSeek-R1-Distill-Qwen-32B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-llama3.3-Bllossom-70B DeepSeek-R1-Distill-Llama-70B ๐Ÿค— HuggingFace

1. Introduction

DeepSeek-llama3.3-Bllossom-70B๋Š” DeepSeek-R1-distill-Llama-70B ๋ชจ๋ธ์„ ๋ฒ ์ด์Šค๋กœ ๊ตฌ์ถ•๋œ ๋ชจ๋ธ๋กœ, ๊ธฐ์กด ๋ฒ ์ด์Šค ๋ชจ๋ธ์ด ์˜์–ด์™€ ์ค‘๊ตญ์–ด ์œ„์ฃผ์˜ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ ์ž ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ๊ธฐ์กด DeepSeek-R1-distill-Llama-70B์˜ ๊ฒฝ์šฐ ํ•œ๊ตญ์–ด๋กœ ์ถ”๋ก  ์‹œ ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ•˜๋ฝํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, DeepSeek-Bllossom์€ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๋ถ€ ์‚ฌ๊ณ  ๊ณผ์ •์€ ์˜์–ด๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ตœ์ข… ์‚ฌ์šฉ์ž์—๊ฒŒ ์ œ๊ณต๋˜๋Š” ์‘๋‹ต์€ ์ž…๋ ฅ ์–ธ์–ด์— ๋”ฐ๋ผ ์ถœ๋ ฅ๋˜๋„๋ก ์ถ”๊ฐ€๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•œ๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ์˜ ์ถ”๋ก  ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต์—๋Š” ํ•œ๊ตญ์–ด, ์˜์–ด reasoning ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ธฐ์กด DeepSeek-R1 ๋ชจ๋ธ ํ•™์Šต์— ์ฃผ๋กœ ์‚ฌ์šฉ๋œ STEM ๋ถ„์•ผ ๋ฐ์ดํ„ฐ ์™ธ์—๋„ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹ ์„ค๊ณ„์™€ ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •์—์„œ DeepSeek-llama3.3-Bllossom์€ ํ•œ๊ตญ์–ด ์‚ฌ์šฉ ํ™˜๊ฒฝ์—์„œ ๋” ์ •ํ™•ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ์ฃผ๋œ ๋ชฉํ‘œ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

DeepSeek-Bllossom Series์˜ ์กฐ๊ธˆ ๋” ์ž‘์€ 8B๋ชจ๋ธ์€ ์ด๊ณณ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. DeepSeek-R1-distill-Llama-Bllossom-8B


2. Post-training

DeepSeek-llama3.3-Bllossom์€ ์ž์ฒด์ ์œผ๋กœ ์ œ์ž‘ํ•œ ๋‹ค์–‘ํ•œ reasoning ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ post-training ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์ด ๋ณด์œ ํ•œ ์šฐ์ˆ˜ํ•œ reasoning ๋Šฅ๋ ฅ๊ณผ ํ•œ๊ตญ์–ด ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์„ DeepSeek-R1-distill-Llama-70B ๋ชจ๋ธ์— ํšจ๊ณผ์ ์œผ๋กœ distillationํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ธฐ์กด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋ณด์™„ํ•˜๊ณ , ๋ณตํ•ฉ์ ์ธ ์ถ”๋ก  ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋” ์ •ํ™•ํ•˜๋ฉฐ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์‘๋‹ต์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.


3. inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B")

system='''
You are a highly capable assistant. For every user question, follow these instructions exactly:
    1.	First, think through the problem step-by-step in English. Enclose all of your internal reasoning between <think> and </think> tags. This chain-of-thought should detail your reasoning process.
    2.	After the closing </think> tag, provide your final answer.
    3.	Do not include any additional text or commentary outside of this format.
    4.	Your output should strictly follow this structure:

<think>
[Your detailed step-by-step reasoning in English]
</think>
[Your final answer]
'''

text="์ฒ ์ˆ˜, ์˜ํฌ, ๋ฏผ์ˆ˜๊ฐ€ 3ํšŒ์˜ ๊ฒŒ์ž„์—์„œ ์ ์ˆ˜๋ฅผ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ์˜ํฌ์˜ ์ ์ˆ˜๋Š” ๋ฏผ์ˆ˜์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฉฐ, ๋ฏผ์ˆ˜์˜ ์ ์ˆ˜๋Š” ์ฒ ์ˆ˜์˜ 4๋ฐฐ์ž…๋‹ˆ๋‹ค. ์ฒ ์ˆ˜๊ฐ€ 10์ ์„ ๋ฐ›์•˜๋‹ค๋ฉด ์ด 3๋ช…์˜ ํ‰๊ท  ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”."
chat = [
    {"role": "system", "content": system},
    {"role": "user", "content": text}
]

prompt=tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer(
    prompt,
    return_tensors="pt",
    add_special_tokens=True
)

if "token_type_ids" in model_inputs:
    del model_inputs["token_type_ids"]

model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192,
)

4. ๋ชจ๋ธ ์‘๋‹ต ๋น„๊ต

์ž…๋ ฅ (Prompt) DeepSeek-R1-distill-Llama-70B DeepSeek-llama3.3-Bllossom-70B
Alice, Bob, Charlie๊ฐ€ 3ํšŒ์˜ ๊ฒŒ์ž„์—์„œ ์ ์ˆ˜๋ฅผ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.
Alice์˜ ์ ์ˆ˜๋Š” Bob์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฉฐ, Bob์˜ ์ ์ˆ˜๋Š” Charlie์˜ 4๋ฐฐ์ž…๋‹ˆ๋‹ค.
Charlie๊ฐ€ 10์ ์„ ๋ฐ›์•˜๋‹ค๋ฉด ์ด 3๋ช…์˜ ํ‰๊ท  ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”.

Charlie์˜ ์ ์ˆ˜๋Š” 10์ ์ž…๋‹ˆ๋‹ค.
Bob์˜ ์ ์ˆ˜๋Š” Charlie์˜ 4๋ฐฐ์ด๋ฏ€๋กœ, Bob์˜ ์ ์ˆ˜๋Š”
$$
4 \times 10 = 40
$$
์ ์ž…๋‹ˆ๋‹ค.
Alice์˜ ์ ์ˆ˜๋Š” Bob์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฏ€๋กœ, Alice์˜ ์ ์ˆ˜๋Š”
$$
2 \times 40 = 80
$$
์ ์ž…๋‹ˆ๋‹ค.
์„ธไบบ็š„ ์ด ์ ์ˆ˜๋Š”
$$
10 + 40 + 80 = 130
$$
์ ์ž…๋‹ˆ๋‹ค.
ํ‰๊ท  ์ ์ˆ˜๋Š”
$$
\frac{130}{3} \approx 43.33$์ ์ž…๋‹ˆ๋‹ค.
$$
\boxed{43.33}
$$
Charlie์˜ ์ ์ˆ˜๋Š” 10์ ์ž…๋‹ˆ๋‹ค.

Charlie์˜ ์ ์ˆ˜๋Š” 10์ ์œผ๋กœ ์ฃผ์–ด์กŒ์Šต๋‹ˆ๋‹ค. Bob์˜ ์ ์ˆ˜๋Š” Charlie์˜ ์ ์ˆ˜์˜ 4๋ฐฐ์ด๋ฉฐ, ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค:
$$
4 \times 10 = 40
$$
Alice์˜ ์ ์ˆ˜๋Š” Bob์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฉฐ, ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
$$
2 \times 40 = 80
$$
์„ธ ์‚ฌ๋žŒ์˜ ์ด ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
$$
10 + 40 + 80 = 130
$$
์ด ์„ธ ์‚ฌ๋žŒ์˜ ํ‰๊ท  ์ ์ˆ˜๋Š” ์ด ์ ์ˆ˜๋ฅผ ์‚ฌ๋žŒ ์ˆ˜๋กœ ๋‚˜๋ˆ„์–ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค:
$$
\frac{130}{3}
$$
๋”ฐ๋ผ์„œ, ์ด ์„ธ ์‚ฌ๋žŒ์˜ ํ‰๊ท  ์ ์ˆ˜๋Š”
$$
\boxed{\dfrac{130}{3}}
$$
์ž…๋‹ˆ๋‹ค.
์†Œ์ˆ˜๊ฐ€ ๋ฌดํ•œํ•˜๋‹ค๋Š” ์‚ฌ์‹ค์„ 3๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ฆ๋ช…ํ•˜์‹œ์˜ค.
ไปฅไธ‹ๆ˜ฏ็”จไธ‰็งๆ–นๆณ•่ฏๆ˜Ž็ด ๆ•ฐๆ— ้™็š„่ฏฆ็ป†ๆญฅ้ชค๏ผš
### ๆ–นๆณ•ไธ€๏ผšๆฌงๅ‡ ้‡Œๅพ—็š„ๅ่ฏๆณ•
1. ๅ‡่ฎพ๏ผšๅ‡่ฎพ็ด ๆ•ฐๆ˜ฏๆœ‰้™็š„๏ผŒไธ”ๆ‰€ๆœ‰็ด ๆ•ฐไธบ $ p_1, p_2, p_3, \ldots, p_n $ใ€‚
2. ๆž„้€ ๆ•ฐ๏ผšๆž„้€ ไธ€ไธชๆ–ฐๆ•ฐ $ N = (p_1 \times p_2 \times p_3 \times \ldots \times p_n) + 1 $ใ€‚
3. ๅˆ†ๆžN็š„ๆ€ง่ดจ๏ผš
- ๅฆ‚ๆžœNๆ˜ฏ็ด ๆ•ฐ๏ผŒๅˆ™NไธๅœจๅŽŸๅ‡่ฎพ็š„็ด ๆ•ฐๅˆ—่กจไธญ๏ผŒไธŽๅ‡่ฎพ็Ÿ›็›พใ€‚
- ๅฆ‚ๆžœNๆ˜ฏๅˆๆ•ฐ๏ผŒๅˆ™ๅญ˜ๅœจๆŸไธช็ด ๆ•ฐ $ p $ ๆ•ด้™คNใ€‚ไฝ†็”ฑไบŽN้™คไปฅไปปไฝ• $ p_i $ ้ƒฝไฝ™1๏ผŒๆ•… $ p $ ไธๅœจๅŽŸๅˆ—่กจไธญ๏ผŒๅŒๆ ท็Ÿ›็›พใ€‚
4. ็ป“่ฎบ๏ผšๅ› ๆญค๏ผŒ็ด ๆ•ฐๅฟ…้กปๆ˜ฏๆ— ้™็š„ใ€‚
### ๆ–นๆณ•ไบŒ๏ผš่ดน้ฉฌๆ•ฐ็š„ๆ€ง่ดจ
1. ๅฎšไน‰๏ผš่ดน้ฉฌๆ•ฐๅฝขๅฆ‚ $ F_k = 2^{2^k} + 1 $๏ผŒๅ…ถไธญ $ k $ ไธบ้ž่ดŸๆ•ดๆ•ฐใ€‚
2. ๅ‡่ฎพ๏ผšๅ‡่ฎพ็ด ๆ•ฐๆ˜ฏๆœ‰้™็š„๏ผŒไธ”ๆ‰€ๆœ‰็ด ๆ•ฐไธบ $ p_1, p_2, \ldots, p_n $ใ€‚
3. ๆž„้€ ่ดน้ฉฌๆ•ฐ๏ผšๅฏนไบŽๆฏไธช $ k $๏ผŒๆž„้€  $ F_k $ใ€‚ๅฆ‚ๆžœ $ F_k $ ๆ˜ฏ็ด ๆ•ฐ๏ผŒๅˆ™ไธๅœจๅŽŸๅˆ—่กจไธญ๏ผ›ๅฆ‚ๆžœๆ˜ฏๅˆๆ•ฐ๏ผŒๅ…ถ็ด ๅ› ๆ•ฐไธๅœจๅŽŸๅˆ—่กจไธญใ€‚
4. ็ป“่ฎบ๏ผšๅ› ๆญค๏ผŒ็ด ๆ•ฐๅฟ…้กปๆ˜ฏๆ— ้™็š„ใ€‚
### ๆ–นๆณ•ไธ‰๏ผšๆฌงๆ‹‰ๅ‡ฝๆ•ฐไธŽ็ด ๆ•ฐๆ— ้™ๆ€ง
1. ๆฌงๆ‹‰ๅ‡ฝๆ•ฐ๏ผšๆฌงๆ‹‰ๅ‡ฝๆ•ฐ $ \phi(n) $ ่กจ็คบๅฐไบŽ $ n $ ไธ”ไธŽ $ n $ ไบ’่ดจ็š„ๆ•ฐ็š„ไธชๆ•ฐใ€‚
2. ๅ‡่ฎพ๏ผšๅ‡่ฎพ็ด ๆ•ฐๆœ‰้™๏ผŒ่ฎฐไธบ $ p_1, p_2, \ldots, p_n $ใ€‚
3. ๆž„้€ ๆ•ฐ๏ผš่€ƒ่™‘ๅฝขๅฆ‚ $ N = p_1^{k_1} p_2^{k_2} \ldots p_n^{k_n} $ ็š„ๆ•ฐ๏ผŒๅ…ถๆฌงๆ‹‰ๅ‡ฝๆ•ฐไธบ $ \phi(N) = N \prod_{i=1}^{n} \left(1 - \frac{1}{p_i}\right) $ใ€‚
4. ๅˆ†ๆž๏ผšๅฆ‚ๆžœ็ด ๆ•ฐๆœ‰้™๏ผŒ$ \prod_{p \text{็ด ๆ•ฐ}} \left(1 - \frac{1}{p}\right) $ ๆ”ถๆ•›๏ผŒไฝ†ๅฎž้™…ไธŠ่ฏฅ็งฏๅ‘ๆ•ฃ๏ผŒๅฏผ่‡ด็Ÿ›็›พใ€‚
5. ็ป“่ฎบ๏ผšๅ› ๆญค๏ผŒ็ด ๆ•ฐๅฟ…้กปๆ˜ฏๆ— ้™็š„ใ€‚
่ฟ™ไธ‰็งๆ–นๆณ•ๅ„่‡ชไปŽไธๅŒ็š„่ง’ๅบฆ่ฏๆ˜Žไบ†็ด ๆ•ฐ็š„ๆ— ้™ๆ€ง๏ผŒๅฑ•็คบไบ†ๆ•ฐ่ฎบไธญ็ด ๆ•ฐๆ— ้™ๆ€ง็š„ๆทฑๅˆปๆ€งใ€‚

์†Œ์ˆ˜๊ฐ€ ๋ฌดํ•œํ•˜๋‹ค๋Š” ๊ฒƒ์„ 3๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
1. ์œ ํด๋ฆฌ๋“œ์˜ ๋ชจ์ˆœ ์ฆ๋ช…:
- ๊ฐ€์ •: ์†Œ์ˆ˜๋Š” ์œ ํ•œํ•œ ๊ฐœ์ˆ˜ $ p_1, p_2, \dots, p_n $์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
- ๊ตฌ์„ฑ: $ P = p_1 \times p_2 \times \dots \times p_n + 1 $์œผ๋กœ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
- ๊ฒฐ๋ก : $ P $๊ฐ€ ์†Œ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ ๋ชฉ๋ก์— ์—†๋Š” ์ƒˆ๋กœ์šด ์†Œ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ $ P $๊ฐ€ ํ•ฉ์„ฑ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ ๋ชฉ๋ก์— ์—†๋Š” ์†Œ์ˆ˜๋ฅผ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ฐ€์ •์— ๋ชจ์ˆœ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์†Œ์ˆ˜๋Š” ๋ฌดํ•œํ•ฉ๋‹ˆ๋‹ค.
2. ํŒฉํ† ๋ฆฌ์–ผ ์ ‘๊ทผ๋ฒ•:
- ๊ฐ€์ •: ์†Œ์ˆ˜๋Š” ์œ ํ•œํ•œ ๊ฐœ์ˆ˜ $ p_1, p_2, \dots, p_n $์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
- ๊ตฌ์„ฑ: $ n! + 1 $์„ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค, ์—ฌ๊ธฐ์„œ $ n $์€ ๊ฐ€์žฅ ํฐ ์†Œ์ˆ˜์ž…๋‹ˆ๋‹ค.
- ๊ฒฐ๋ก : $ n! $์€ ๋ชจ๋“  $ p_i $๋กœ ๋‚˜๋ˆ„์–ด ๋–จ์–ด์ง€๋ฏ€๋กœ, $ n! + 1 $์€ $ p_i $ ์ค‘ ์–ด๋Š ๊ฒƒ๋„ ํฌํ•จํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ $ n! + 1 $์ด ์†Œ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ $ n $๋ณด๋‹ค ํฐ ์ƒˆ๋กœ์šด ์†Œ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ $ n! + 1 $์ด ํ•ฉ์„ฑ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ $ n $๋ณด๋‹ค ํฐ ์†Œ์ˆ˜๋ฅผ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ฐ€์ •์— ๋ชจ์ˆœ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์†Œ์ˆ˜๋Š” ๋ฌดํ•œํ•ฉ๋‹ˆ๋‹ค.
3. ์˜ค์ผ๋Ÿฌ์˜ ์ œํƒ€ ํ•จ์ˆ˜ ์ฆ๋ช…:
- ๊ฐœ๋…: ์˜ค์ผ๋Ÿฌ๋Š” ์ œํƒ€ ํ•จ์ˆ˜ $ \zeta(s) = \sum_{k=1}^\infty \frac{1}{k^s} $๊ฐ€ ์†Œ์ˆ˜์— ๋Œ€ํ•œ ๊ณฑ์œผ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค: $ \zeta(s) = \prod_{p \text{ ์†Œ์ˆ˜}} \frac{1}{1 - \frac{1}{p^s}} $.
- ๊ฒฐ๋ก : ๋งŒ์•ฝ ์†Œ์ˆ˜๊ฐ€ ์œ ํ•œํ•˜๋ฉด, ๊ณฑ์€ ์œ ํ•œํ•ด์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ $ s $๊ฐ€ 1๋ณด๋‹ค ํฐ ๊ฐ’์„ ํ–ฅํ•ด ์ ‘๊ทผํ•  ๋•Œ, $ \zeta(s) $๋Š” ๋ฌดํ•œ๋Œ€๋กœ ๋ฐœ์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๊ณฑ์ด ๋ฐœ์‚ฐํ•ด์•ผ ํ•จ์„ ์š”๊ตฌํ•˜๋ฉฐ, ๋”ฐ๋ผ์„œ ์†Œ์ˆ˜๋Š” ๋ฌดํ•œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๊ฐ ๋ฐฉ๋ฒ•์€ ์†Œ์ˆ˜๊ฐ€ ๋ฌดํ•œํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ฆํ•˜๋ฉฐ, ์„œ๋กœ ๋‹ค๋ฅธ ์ˆ˜ํ•™์  ์›์น™์— ๊ธฐ๋ฐ˜์„ ๋‘ก๋‹ˆ๋‹ค. ์œ ํด๋ฆฌ๋“œ์˜ ์ฆ๋ช…์€ ๋ชจ์ˆœ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค, ํŒฉํ† ๋ฆฌ์–ผ ์ ‘๊ทผ๋ฒ•์€ ๊ฐ„๋‹จํ•œ ์ˆ˜๋ก ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ์˜ค์ผ๋Ÿฌ์˜ ์ฆ๋ช…์€ ํ•ด์„์  ์ˆ˜๋ก ์˜ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์„ธ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ์†Œ์ˆ˜ ๋ฌดํ•œ์„ฑ์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ๊ฐ•ํ™”ํ•ฉ๋‹ˆ๋‹ค.

5. License

This code repository and the model weights are licensed under the MIT License. DeepSeek-Bllossom series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

  • DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.
  • DeepSeek-llama3.3-Bllossom-70B is derived from DeepSeek-R1-Distill-Llama-70B and is originally licensed under llama3.3 license.

6. Contributor

7. Contact

If you have any questions, please raise an issue or contact us at [email protected].