Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ That said, it does feel unique and fun to use. If you're the type of person who'
|
|
21 |
ChatML
|
22 |
|
23 |
## Samplers
|
24 |
-
Because stack merges introduce some unexpected noise to the model, I recommend higher min p than normal. I've been getting good results with min_p 0.
|
25 |
|
26 |
### Configuration
|
27 |
|
@@ -70,7 +70,7 @@ idtype: bfloat16
|
|
70 |
tokenizer_source: ../Hermes-3-Llama-3.1-70B
|
71 |
```
|
72 |
|
73 |
-
|
74 |
|
75 |
---
|
76 |
|
|
|
21 |
ChatML
|
22 |
|
23 |
## Samplers
|
24 |
+
Because stack merges introduce some unexpected noise to the model, I recommend higher min p than normal. I've been getting good results with min_p 0.09-0.11 -> temp 0.8-1.0, add your favorite anti-repetition sampler as needed.
|
25 |
|
26 |
### Configuration
|
27 |
|
|
|
70 |
tokenizer_source: ../Hermes-3-Llama-3.1-70B
|
71 |
```
|
72 |
|
73 |
+
In the first few iterations I tried merging the tokenizers in an attempt to support both ChatML and L3, but it ended up breaking both of them. Also tried lower and higher slerp ratios but this seems like the sweet spot.
|
74 |
|
75 |
---
|
76 |
|