Safetensors
gpt2
machine-teaching-group commited on
Commit
fd3bd25
1 Parent(s): 49aaf1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - openai-community/gpt2-large
5
+ ---
6
+ # Model Card for Model ID
7
+
8
+ ### Summary
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ This is supervised fine-tuned model for text summarization based on GPT-2 (large). It has been finetuned on the filtered version of TL;DR train dataset, which can be found and downloaded from here: [https://github.com/openai/summarize-from-feedback](https://github.com/openai/summarize-from-feedback).
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ - **Developed by:** Machine Teaching Group
19
+ - **Finetuned from model:** openai-community/gpt2-large
20
+
21
+ ### Training Details
22
+
23
+ This model has been trained using the TLR library and SFTTrainer class from Huggingface.
24
+
25
+ ### Training Data
26
+
27
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
28
+
29
+ The filtered version of TL;DR train dataset, which can be found and downloaded from here: [https://openaipublic.blob.core.windows.net/summarize-from-feedback/datasets/tldr_3_filtered/train.jsonl](https://openaipublic.blob.core.windows.net/summarize-from-feedback/datasets/tldr_3_filtered/train.jsonl).
30
+
31
+ #### Training Hyperparameters
32
+
33
+ The following hyperparameters were used during training:
34
+
35
+ - learning_rate: 1e-05
36
+ - train_batch_size: 8
37
+ - eval_batch_size: 8
38
+ - seed: 2024
39
+ - distributed_type: multi-GPU
40
+ - num_devices: 8
41
+ - gradient_accumulation_steps: 1
42
+ - total_train_batch_size: 64
43
+ - total_eval_batch_size: 64
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - num_epochs: 1
47
+
48
+ ### Framework Versions
49
+
50
+ - accelerate==0.26.1
51
+ - datasets==2.16.1
52
+ - transformers==4.45.2
53
+ - trl==0.11.2
54
+
55
+ ### Compute Infrastructure and Hardware
56
+
57
+ Slurm cluster with 8 x H100 Nvidia GPUs.