PocketDoc commited on
Commit
f579d37
·
verified ·
1 Parent(s): 82aa3bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -8
README.md CHANGED
@@ -133,11 +133,11 @@ library_name: transformers
133
  range of tasks and domains.
134
  </p>
135
  <p>
136
- This model series has been meticulously fine-tuned
137
- on a diverse corpus of 40+ specialized datasets to
138
- excel at both creative endeavors like roleplay and
139
- co-writing, and technical tasks such as code
140
- generation, tool use, and complex reasoning.
141
  </p>
142
  <p>
143
  The total dataset size is around 1.7B tokens, 1.1B
@@ -146,7 +146,9 @@ library_name: transformers
146
  <p>
147
  V1.3.0 introduces multilingual capabilities with
148
  support for 10 languages and enhanced domain
149
- expertise across multiple fields.
 
 
150
  </p>
151
  <h3>Multilingual Support</h3>
152
  <pre class="code-block">
@@ -158,19 +160,35 @@ Hindi Japanese Korean Portuguese Spanish</pre
158
  BASE MODEL: mistralai/Mistral-Small-3.1-24B-Base-2503
159
  LICENSE: apache-2.0
160
  LANGUAGE: Multilingual with 10 supported languages
161
- CONTEXT LENGTH: 32768 tokens, 131072 with degraded performance</pre
162
  >
163
  <h3>Recommended Settings</h3>
164
  <pre class="code-block">
165
  TEMPERATURE: 1.0
166
  TOP_P: 0.95
167
- MIN_P: 0.05</pre
 
168
  >
169
  <h3>Prompting Format</h3>
170
  <p>
171
  The model uses the following format I'll refer to as
172
  "DanChat-2":
173
  </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  <pre class="code-block">
175
  <|system|>system prompt<|endoftext|><|user|>Hi there!<|endoftext|><|assistant|>Hey, how can I help?<|endoftext|></pre
176
  >
 
133
  range of tasks and domains.
134
  </p>
135
  <p>
136
+ Fine-tuned on a diverse corpus of 50+ specialized
137
+ datasets, this series excels at both creative
138
+ endeavors (like roleplay and co-writing) and
139
+ technical tasks (such as code generation, tool use,
140
+ and complex reasoning).
141
  </p>
142
  <p>
143
  The total dataset size is around 1.7B tokens, 1.1B
 
146
  <p>
147
  V1.3.0 introduces multilingual capabilities with
148
  support for 10 languages and enhanced domain
149
+ expertise across multiple fields. The primary
150
+ language is still English and that is where peak
151
+ performance can be expected.
152
  </p>
153
  <h3>Multilingual Support</h3>
154
  <pre class="code-block">
 
160
  BASE MODEL: mistralai/Mistral-Small-3.1-24B-Base-2503
161
  LICENSE: apache-2.0
162
  LANGUAGE: Multilingual with 10 supported languages
163
+ CONTEXT LENGTH: 32768 tokens, 131072 with degraded quality</pre
164
  >
165
  <h3>Recommended Settings</h3>
166
  <pre class="code-block">
167
  TEMPERATURE: 1.0
168
  TOP_P: 0.95
169
+ MIN_P: 0.05
170
+ REPETITION_PENALTY: 1.04</pre
171
  >
172
  <h3>Prompting Format</h3>
173
  <p>
174
  The model uses the following format I'll refer to as
175
  "DanChat-2":
176
  </p>
177
+ <h4>Why not ChatML?</h4>
178
+ <p>
179
+ ChatML is a widely used and standardized format for
180
+ LLMs but it has some limitations, using standard
181
+ tokens as turn ownership indicators can impart
182
+ biases to the model. DanChat-2 uses unique special
183
+ tokens for each role, which helps to reduce these
184
+ biases and allow the model to more readily adapt to
185
+ different roles and tasks. Yes, It is possible to
186
+ achieve a similar effect with ChatML, but the
187
+ technique to do so would be nonstandard to the
188
+ ChatML format and for users who do not use standard
189
+ "assistant" and "user" roles, it would fall apart
190
+ entirely.
191
+ </p>
192
  <pre class="code-block">
193
  <|system|>system prompt<|endoftext|><|user|>Hi there!<|endoftext|><|assistant|>Hey, how can I help?<|endoftext|></pre
194
  >