Text Generation
Transformers
PyTorch
Safetensors
English
mistral
text-generation-inference
t1101675 nielsr HF Staff commited on
Commit
e6fa338
·
verified ·
1 Parent(s): 3bbeaa3

Add link to paper (#2)

Browse files

- Add link to paper (041614c4bcf30bc8cc49d25e249c112cb3238eda)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -1,22 +1,22 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - togethercomputer/RedPajama-Data-1T
5
  language:
6
  - en
7
- pipeline_tag: text-generation
8
  library_name: transformers
 
 
9
  ---
10
 
11
  ## PDS-160M
12
 
13
- [paper](https://arxiv.org/abs/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection)
14
 
15
  **PDS-160M** is a 160M model with [Mistral](https://arxiv.org/abs/2310.06825) achitecture pre-trained from scratch on the data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data), using the PDS framework.
16
 
17
  The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, which not only enjoy strong theoretical support but is also scalable for training large language models.
18
 
19
- Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
20
 
21
  ### Overview of the theory:
22
 
@@ -32,7 +32,7 @@ Please refer to our [paper](https://arxiv.org/abs/2410.07064) for more details.
32
 
33
  ### Evaluation
34
 
35
- PDS-selected data improves the performance of language models pre-trained from scratch and saves pre-training comptation. The improvement scales up to large model sizes.
36
 
37
  <p align='left'>
38
  <img src="https://cdn-uploads.huggingface.co/production/uploads/624ac662102fcdff87be51b9/6undIr37d10qD73TDiPDK.png" width="600">
@@ -51,4 +51,4 @@ PDS-selected data improves the performance of language models pre-trained from s
51
  journal={arXiv preprint arXiv:2410.07064},
52
  year={2024}
53
  }
54
- ```
 
1
  ---
 
2
  datasets:
3
  - togethercomputer/RedPajama-Data-1T
4
  language:
5
  - en
 
6
  library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
  ## PDS-160M
12
 
13
+ [paper](https://huggingface.co/papers/2410.07064) | [code](https://github.com/microsoft/LMOps/tree/main/data_selection)
14
 
15
  **PDS-160M** is a 160M model with [Mistral](https://arxiv.org/abs/2310.06825) achitecture pre-trained from scratch on the data selected from the CC split of [Redpajama](https://github.com/togethercomputer/RedPajama-Data), using the PDS framework.
16
 
17
  The PDS framework is based on the [Pontryagin's maximum principle](https://en.wikipedia.org/wiki/Pontryagin%27s_maximum_principle#:~:text=Pontryagin's%20maximum%20principle%20is%20used,the%20state%20or%20input%20controls.) for optimal pre-training data selection, which not only enjoy strong theoretical support but is also scalable for training large language models.
18
 
19
+ Please refer to our [paper](https://huggingface.co/papers/2410.07064) for more details.
20
 
21
  ### Overview of the theory:
22
 
 
32
 
33
  ### Evaluation
34
 
35
+ PDS-selected data improves the performance of language models pre-trained from scratch and saves pre-training computation. The improvement scales up to large model sizes.
36
 
37
  <p align='left'>
38
  <img src="https://cdn-uploads.huggingface.co/production/uploads/624ac662102fcdff87be51b9/6undIr37d10qD73TDiPDK.png" width="600">
 
51
  journal={arXiv preprint arXiv:2410.07064},
52
  year={2024}
53
  }
54
+ ```