Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,7 @@ tags:
|
|
10 |
- slm
|
11 |
- conversation
|
12 |
- chat
|
|
|
13 |
base_model:
|
14 |
- dnotitia/Llama-DNA-1.0-8B-Instruct
|
15 |
library_name: transformers
|
@@ -17,150 +18,52 @@ pipeline_tag: text-generation
|
|
17 |
---
|
18 |
|
19 |
# DNA 1.0 8B Instruct
|
20 |
-
|
21 |
<p align="center">
|
22 |
<img src="assets/dna-logo.png" width="400" style="margin: 40px auto;">
|
23 |
</p>
|
24 |
-
<br>
|
25 |
-
|
26 |
-
## Introduction
|
27 |
-
|
28 |
-
We introduce **DNA 1.0 8B Instruct**, a state-of-the-art (**SOTA**) bilingual language model optimized for both Korean and English languages, developed and released by **Dnotitia Inc.** This model is based on the Llama architecture and has been meticulously enhanced through various advanced training techniques to excel in language understanding and generation tasks.
|
29 |
-
|
30 |
-
The DNA 1.0 8B Instruct model has undergone a sophisticated development process:
|
31 |
-
|
32 |
-
- **Model Merging via SLERP:** Combined with Llama 3.1 8B Instruct using spherical linear interpolation to enhance performance.
|
33 |
-
- **Knowledge Distillation (KD):** Utilizing Llama 3.1 405B as the teacher model to improve knowledge representation.
|
34 |
-
- **Continual Pre-Training (CPT):** Trained on a high-quality Korean dataset to boost language capabilities.
|
35 |
-
- **Supervised Fine-Tuning (SFT):** Aligned with human preferences through fine-tuning on curated data.
|
36 |
-
- **Direct Preference Optimization (DPO):** Enhanced instruction-following abilities for better user interaction.
|
37 |
-
|
38 |
-
Each model supports long-context processing of up to **131,072 tokens (128K)**, enabling it to handle extensive conversational histories and long documents effectively.
|
39 |
|
40 |
-
<
|
41 |
-
|
42 |
-
## Evaluation
|
43 |
-
|
44 |
-
We evaluated DNA 1.0 8B Instruct against other prominent language models of similar sizes across various benchmarks, including Korean-specific tasks and general language understanding metrics.
|
45 |
-
|
46 |
-
<br>
|
47 |
-
|
48 |
-
<table>
|
49 |
-
<tr>
|
50 |
-
<th>Language</th>
|
51 |
-
<th>Benchmark</th>
|
52 |
-
<th>dnotitia<br>DNA 1.0<br>8B Instruct</th>
|
53 |
-
<th>EXAONE 3.5<br>7.8B</th>
|
54 |
-
<th>Qwen 2.5<br>7B</th>
|
55 |
-
<th>Llama 3.1<br>8B</th>
|
56 |
-
<th>Mistral<br>7B</th>
|
57 |
-
</tr>
|
58 |
-
<tr>
|
59 |
-
<td rowspan="5">Korean</td>
|
60 |
-
<td>KMMLU</td>
|
61 |
-
<td align="center"><strong>53.26</strong></td>
|
62 |
-
<td align="center">45.30</td>
|
63 |
-
<td align="center">45.66</td>
|
64 |
-
<td align="center">41.66</td>
|
65 |
-
<td align="center">31.45</td>
|
66 |
-
</tr>
|
67 |
-
<tr>
|
68 |
-
<td>KMMLU-Hard</td>
|
69 |
-
<td align="center"><strong>29.46</strong></td>
|
70 |
-
<td align="center">23.17</td>
|
71 |
-
<td align="center">24.78</td>
|
72 |
-
<td align="center">20.49</td>
|
73 |
-
<td align="center">17.86</td>
|
74 |
-
</tr>
|
75 |
-
<tr>
|
76 |
-
<td>KoBEST</td>
|
77 |
-
<td align="center"><strong>83.40</strong></td>
|
78 |
-
<td align="center">79.05</td>
|
79 |
-
<td align="center">78.51</td>
|
80 |
-
<td align="center">67.56</td>
|
81 |
-
<td align="center">63.77</td>
|
82 |
-
</tr>
|
83 |
-
<tr>
|
84 |
-
<td>Belebele</td>
|
85 |
-
<td align="center"><strong>57.99</strong></td>
|
86 |
-
<td align="center">40.97</td>
|
87 |
-
<td align="center">54.85</td>
|
88 |
-
<td align="center">54.70</td>
|
89 |
-
<td align="center">40.31</td>
|
90 |
-
</tr>
|
91 |
-
<tr>
|
92 |
-
<td>CSAT QA</td>
|
93 |
-
<td align="center">43.32</td>
|
94 |
-
<td align="center">40.11</td>
|
95 |
-
<td align="center"><strong>45.45</strong></td>
|
96 |
-
<td align="center">36.90</td>
|
97 |
-
<td align="center">27.27</td>
|
98 |
-
</tr>
|
99 |
-
<tr>
|
100 |
-
<td rowspan="3">English</td>
|
101 |
-
<td>MMLU</td>
|
102 |
-
<td align="center">66.64</td>
|
103 |
-
<td align="center">65.27</td>
|
104 |
-
<td align="center"><strong>74.26</strong></td>
|
105 |
-
<td align="center">68.26</td>
|
106 |
-
<td align="center">62.04</td>
|
107 |
-
</tr>
|
108 |
-
<tr>
|
109 |
-
<td>MMLU Pro</td>
|
110 |
-
<td align="center"><strong>43.05</strong></td>
|
111 |
-
<td align="center">40.73</td>
|
112 |
-
<td align="center">42.50</td>
|
113 |
-
<td align="center">40.92</td>
|
114 |
-
<td align="center">33.49</td>
|
115 |
-
</tr>
|
116 |
-
<tr>
|
117 |
-
<td>GSM8K</td>
|
118 |
-
<td align="center"><strong>80.52</strong></td>
|
119 |
-
<td align="center">65.96</td>
|
120 |
-
<td align="center">75.74</td>
|
121 |
-
<td align="center">75.82</td>
|
122 |
-
<td align="center">49.66</td>
|
123 |
-
</tr>
|
124 |
-
</table>
|
125 |
-
|
126 |
-
- The **highest scores** are in **bold**.
|
127 |
-
|
128 |
-
<br>
|
129 |
|
130 |
-
|
131 |
|
132 |
-
|
|
|
|
|
|
|
|
|
|
|
133 |
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
| KMMLU-Hard | 5-shot | `macro_avg` / `exact_match` | `lm-eval-harness` |
|
138 |
-
| KoBEST | 5-shot | `macro_avg` / `f1` | `lm-eval-harness` |
|
139 |
-
| Belebele | 0-shot | `accuracy` | `lm-eval-harness` |
|
140 |
-
| CSAT QA | 0-shot | `accuracy_normalized` | `lm-eval-harness` |
|
141 |
-
| MMLU | 5-shot | `macro_avg` / `accuracy` | `lm-eval-harness` |
|
142 |
-
| MMLU Pro | 5-shot | `macro_avg` / `exact_match` | `lm-eval-harness` |
|
143 |
-
| GSM8K | 5-shot | `accuracy` / `exact_match` | `lm-eval-harness` |
|
144 |
-
|
145 |
-
<br>
|
146 |
|
147 |
## Quickstart
|
148 |
|
149 |
We offer weights in `F32`, `F16` formats and quantized weights in `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K`, `Q3_K` and `Q2_K` formats.
|
150 |
|
151 |
-
You can
|
|
|
|
|
|
|
|
|
152 |
|
153 |
```bash
|
154 |
# Install huggingface_hub if not already installed
|
155 |
-
pip install huggingface_hub
|
156 |
|
157 |
# Download the GGUF weights
|
158 |
-
huggingface-cli download dnotitia/Llama-DNA-1.0-8B-Instruct-GGUF \
|
159 |
-
--include "DNA-1.0-8B-Instruct-Q8_0.gguf" \
|
160 |
--local-dir .
|
161 |
```
|
162 |
|
163 |
-
|
|
|
|
|
|
|
|
|
|
|
164 |
|
165 |
## Run Locally
|
166 |
|
|
|
10 |
- slm
|
11 |
- conversation
|
12 |
- chat
|
13 |
+
- gguf
|
14 |
base_model:
|
15 |
- dnotitia/Llama-DNA-1.0-8B-Instruct
|
16 |
library_name: transformers
|
|
|
18 |
---
|
19 |
|
20 |
# DNA 1.0 8B Instruct
|
21 |
+
|
22 |
<p align="center">
|
23 |
<img src="assets/dna-logo.png" width="400" style="margin: 40px auto;">
|
24 |
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
+
**DNA 1.0 8B Instruct** is a <u>state-of-the-art (**SOTA**)</u> bilingual language model based on Llama architecture, specifically optimized for Korean language understanding and generation, while also maintaining strong English capabilities. The model was developed through a sophisticated process involving model merging via spherical linear interpolation (**SLERP**) with Llama 3.1 8B Instruct, and underwent knowledge distillation (**KD**) using Llama 3.1 405B as the teacher model. It was extensively trained through continual pre-training (**CPT**) with a high-quality Korean dataset. The training pipeline was completed with supervised fine-tuning (**SFT**) and direct preference optimization (**DPO**) to align with human preferences and enhance instruction-following abilities.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
+
DNA 1.0 8B Instruct was fine-tuned on approximately 10B tokens of carefully curated data and has undergone extensive instruction tuning to enhance its ability to follow complex instructions and engage in natural conversations.
|
29 |
|
30 |
+
- **Developed by:** Dnotitia Inc.
|
31 |
+
- **Supported Languages:** Korean, English
|
32 |
+
- **Model Release Date:** Dec 10, 2024
|
33 |
+
- **Vocab Size:** 128,256
|
34 |
+
- **Context Length:** 131,072 tokens (128k)
|
35 |
+
- **License:** CC BY-NC 4.0
|
36 |
|
37 |
+
<p align="center">
|
38 |
+
<img src="assets/training-procedure.png" width="600" style="margin: 40px auto;">
|
39 |
+
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
## Quickstart
|
42 |
|
43 |
We offer weights in `F32`, `F16` formats and quantized weights in `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K`, `Q3_K` and `Q2_K` formats.
|
44 |
|
45 |
+
You can run GGUF weights with `llama.cpp` as follows:
|
46 |
+
|
47 |
+
1. Install `llama.cpp`. Please refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp) for more details.
|
48 |
+
|
49 |
+
2. Download DNA 1.0 8B Instruct model in GGUF format.
|
50 |
|
51 |
```bash
|
52 |
# Install huggingface_hub if not already installed
|
53 |
+
$ pip install huggingface_hub[cli]
|
54 |
|
55 |
# Download the GGUF weights
|
56 |
+
$ huggingface-cli download dnotitia/Llama-DNA-1.0-8B-Instruct-GGUF \
|
57 |
+
--include "Llama-DNA-1.0-8B-Instruct-Q8_0.gguf" \
|
58 |
--local-dir .
|
59 |
```
|
60 |
|
61 |
+
3. Run the model with `llama.cpp` in conversational mode.
|
62 |
+
|
63 |
+
```bash
|
64 |
+
$ llama-cli -cnv -m ./Llama-DNA-1.0-8B-Instruct-Q8_0.gguf \
|
65 |
+
-p "You are a helpful assistant, Dnotitia DNA."
|
66 |
+
```
|
67 |
|
68 |
## Run Locally
|
69 |
|