Update README.md
Browse files
README.md
CHANGED
@@ -133,11 +133,11 @@ library_name: transformers
|
|
133 |
range of tasks and domains.
|
134 |
</p>
|
135 |
<p>
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
</p>
|
142 |
<p>
|
143 |
The total dataset size is around 1.7B tokens, 1.1B
|
@@ -146,7 +146,9 @@ library_name: transformers
|
|
146 |
<p>
|
147 |
V1.3.0 introduces multilingual capabilities with
|
148 |
support for 10 languages and enhanced domain
|
149 |
-
expertise across multiple fields.
|
|
|
|
|
150 |
</p>
|
151 |
<h3>Multilingual Support</h3>
|
152 |
<pre class="code-block">
|
@@ -158,19 +160,35 @@ Hindi Japanese Korean Portuguese Spanish</pre
|
|
158 |
BASE MODEL: mistralai/Mistral-Small-3.1-24B-Base-2503
|
159 |
LICENSE: apache-2.0
|
160 |
LANGUAGE: Multilingual with 10 supported languages
|
161 |
-
CONTEXT LENGTH: 32768 tokens, 131072 with degraded
|
162 |
>
|
163 |
<h3>Recommended Settings</h3>
|
164 |
<pre class="code-block">
|
165 |
TEMPERATURE: 1.0
|
166 |
TOP_P: 0.95
|
167 |
-
MIN_P: 0.05
|
|
|
168 |
>
|
169 |
<h3>Prompting Format</h3>
|
170 |
<p>
|
171 |
The model uses the following format I'll refer to as
|
172 |
"DanChat-2":
|
173 |
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
174 |
<pre class="code-block">
|
175 |
<|system|>system prompt<|endoftext|><|user|>Hi there!<|endoftext|><|assistant|>Hey, how can I help?<|endoftext|></pre
|
176 |
>
|
|
|
133 |
range of tasks and domains.
|
134 |
</p>
|
135 |
<p>
|
136 |
+
Fine-tuned on a diverse corpus of 50+ specialized
|
137 |
+
datasets, this series excels at both creative
|
138 |
+
endeavors (like roleplay and co-writing) and
|
139 |
+
technical tasks (such as code generation, tool use,
|
140 |
+
and complex reasoning).
|
141 |
</p>
|
142 |
<p>
|
143 |
The total dataset size is around 1.7B tokens, 1.1B
|
|
|
146 |
<p>
|
147 |
V1.3.0 introduces multilingual capabilities with
|
148 |
support for 10 languages and enhanced domain
|
149 |
+
expertise across multiple fields. The primary
|
150 |
+
language is still English and that is where peak
|
151 |
+
performance can be expected.
|
152 |
</p>
|
153 |
<h3>Multilingual Support</h3>
|
154 |
<pre class="code-block">
|
|
|
160 |
BASE MODEL: mistralai/Mistral-Small-3.1-24B-Base-2503
|
161 |
LICENSE: apache-2.0
|
162 |
LANGUAGE: Multilingual with 10 supported languages
|
163 |
+
CONTEXT LENGTH: 32768 tokens, 131072 with degraded quality</pre
|
164 |
>
|
165 |
<h3>Recommended Settings</h3>
|
166 |
<pre class="code-block">
|
167 |
TEMPERATURE: 1.0
|
168 |
TOP_P: 0.95
|
169 |
+
MIN_P: 0.05
|
170 |
+
REPETITION_PENALTY: 1.04</pre
|
171 |
>
|
172 |
<h3>Prompting Format</h3>
|
173 |
<p>
|
174 |
The model uses the following format I'll refer to as
|
175 |
"DanChat-2":
|
176 |
</p>
|
177 |
+
<h4>Why not ChatML?</h4>
|
178 |
+
<p>
|
179 |
+
ChatML is a widely used and standardized format for
|
180 |
+
LLMs but it has some limitations, using standard
|
181 |
+
tokens as turn ownership indicators can impart
|
182 |
+
biases to the model. DanChat-2 uses unique special
|
183 |
+
tokens for each role, which helps to reduce these
|
184 |
+
biases and allow the model to more readily adapt to
|
185 |
+
different roles and tasks. Yes, It is possible to
|
186 |
+
achieve a similar effect with ChatML, but the
|
187 |
+
technique to do so would be nonstandard to the
|
188 |
+
ChatML format and for users who do not use standard
|
189 |
+
"assistant" and "user" roles, it would fall apart
|
190 |
+
entirely.
|
191 |
+
</p>
|
192 |
<pre class="code-block">
|
193 |
<|system|>system prompt<|endoftext|><|user|>Hi there!<|endoftext|><|assistant|>Hey, how can I help?<|endoftext|></pre
|
194 |
>
|