Kristijan
/

gpt2_wt103-40m_12-layer

Model card Files Files and versions

Kristijan commited on Oct 26, 2022

Commit

1276734

·

1 Parent(s): 4b2e4e2

Update README.md

add usage descriptions

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -29,6 +29,28 @@ paper: [Characterizing Verbatim Short-Term Memory in Neural Language Models](htt
 This is a gpt2-small-like decoder-only transformer model trained on a 40M token subset of the [wikitext-103 dataset](https://paperswithcode.com/dataset/wikitext-103).
 # Intended uses
 This checkpoint is intended for research purposes, for example those interested in studying the behavior of transformer language models trained on smaller datasets.

 This is a gpt2-small-like decoder-only transformer model trained on a 40M token subset of the [wikitext-103 dataset](https://paperswithcode.com/dataset/wikitext-103).
+# Usage
+You can download and load the model as follows:
+```python
+from transformers import GPT2LMHeadModel
+model = GPT2LMHeadModel.from_pretrained("Kristijan/gpt2_wt103-40m_12-layer")
+```
+Alternatively, if you've downloaded the checkpoint files in this repository, you could also do:
+```python
+from transformers import GPT2LMHeadModel
+model = GPT2LMHeadModel.from_pretrained(path_to_folder_with_checkpoint_files)
+```
+To tokenize your text for this model, you should use the [tokenizer trained on Wikitext-103](https://huggingface.co/Kristijan/wikitext-103-tokenizer)
 # Intended uses
 This checkpoint is intended for research purposes, for example those interested in studying the behavior of transformer language models trained on smaller datasets.