Text Generation
Safetensors
deepseek_v3
conversational
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -59,7 +59,7 @@ For more information, read the [blog post](https://nousresearch.com/nous-psyche/
59
  **Training Duration:** TBD
60
  **Optimizer:** [DisTrO](https://github.com/NousResearch/DisTrO), decentralized version
61
 
62
- # Pretaining Dataset
63
  For training data, we combined FineWeb (14T), FineWeb-2 with some less common languages removed (4T), and The Stack V2 (~.2T, upsampled to 1T tokens). We chose these datasets over more specialized pre-training datasets that aim to purely increase benchmark performance. Our goal with Consilience is to make a true "base" model -- one representative of the entirety of the creative output of humanity, and not merely trying to win the benchmaxxing game.
64
 
65
  Additionally, we're training this model continuously without a final data "annealing" step. While annealing helps base models respond more accurately to benchmarks and improves usability, it may potentially constrain creativity and interesting behaviors. Our solution is to simply release both versions: the raw, un-annealed base model first, followed by an annealed version to aid in usability.
 
59
  **Training Duration:** TBD
60
  **Optimizer:** [DisTrO](https://github.com/NousResearch/DisTrO), decentralized version
61
 
62
+ # Pretraining Dataset
63
  For training data, we combined FineWeb (14T), FineWeb-2 with some less common languages removed (4T), and The Stack V2 (~.2T, upsampled to 1T tokens). We chose these datasets over more specialized pre-training datasets that aim to purely increase benchmark performance. Our goal with Consilience is to make a true "base" model -- one representative of the entirety of the creative output of humanity, and not merely trying to win the benchmaxxing game.
64
 
65
  Additionally, we're training this model continuously without a final data "annealing" step. While annealing helps base models respond more accurately to benchmarks and improves usability, it may potentially constrain creativity and interesting behaviors. Our solution is to simply release both versions: the raw, un-annealed base model first, followed by an annealed version to aid in usability.