lapp0 commited on
Commit
1bd9938
·
verified ·
1 Parent(s): 9918853

End of training

Browse files
Files changed (1) hide show
  1. README.md +24 -23
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
- license: mit
3
  base_model: gpt2
 
 
4
  tags:
5
  - Distily
6
  - generated_from_trainer
@@ -9,14 +10,17 @@ model-index:
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
  # gpt2_model_card_distily_test
16
 
17
- This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 
 
 
18
  It achieves the following results on the evaluation set:
19
- - Loss: 1664.0
 
 
 
20
 
21
  ## Model description
22
 
@@ -29,12 +33,16 @@ More information needed
29
  ## Training and evaluation data
30
 
31
  More information needed
 
32
 
33
  ## Training procedure
34
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
 
 
 
38
  - learning_rate: 0.0001
39
  - train_batch_size: 1
40
  - eval_batch_size: 2
@@ -43,25 +51,18 @@ The following hyperparameters were used during training:
43
  - lr_scheduler_type: cosine
44
  - num_epochs: 1.0
45
 
46
- ### Training results
47
-
48
- | Training Loss | Epoch | Step | Validation Loss |
49
- |:-------------:|:------:|:----:|:---------------:|
50
- | No log | 0 | 0 | 6368.0 |
51
- | 2426.5 | 0.1001 | 100 | 1944.0 |
52
- | 2146.5 | 0.2002 | 200 | 1856.0 |
53
- | 2255.0 | 0.3003 | 300 | 1808.0 |
54
- | 2116.75 | 0.4004 | 400 | 1792.0 |
55
- | 1731.625 | 0.5005 | 500 | 1736.0 |
56
- | 2095.75 | 0.6006 | 600 | 1680.0 |
57
- | 1991.0 | 0.7007 | 700 | 1680.0 |
58
- | 1994.0 | 0.8008 | 800 | 1672.0 |
59
- | 2166.5 | 0.9009 | 900 | 1664.0 |
60
-
61
 
62
  ### Framework versions
63
-
64
  - Transformers 4.43.3
65
  - Pytorch 2.3.0
66
  - Datasets 2.20.0
67
- - Tokenizers 0.19.1
 
1
  ---
 
2
  base_model: gpt2
3
+ library_name: distily
4
+ license: mit
5
  tags:
6
  - Distily
7
  - generated_from_trainer
 
10
  results: []
11
  ---
12
 
 
 
 
13
  # gpt2_model_card_distily_test
14
 
15
+ This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).
16
+
17
+ The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
18
+
19
  It achieves the following results on the evaluation set:
20
+ - train_loss: 2109.4855
21
+
22
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
23
+ should probably proofread and complete it, then remove this comment.
24
 
25
  ## Model description
26
 
 
33
  ## Training and evaluation data
34
 
35
  More information needed
36
+ -->
37
 
38
  ## Training procedure
39
 
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
43
+ - distillation_strategy: logits_activations
44
+ - loss_fn: reverse_kl
45
+ - train_embeddings: True
46
  - learning_rate: 0.0001
47
  - train_batch_size: 1
48
  - eval_batch_size: 2
 
51
  - lr_scheduler_type: cosine
52
  - num_epochs: 1.0
53
 
54
+ ### Model Results
55
+ | epoch | eval_enwikippl | eval_frwikippl | eval_loss | eval_runtime | eval_samples_per_second | eval_steps_per_second | eval_zhwikippl | step | train_loss |
56
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
57
+ | 0 | 61518.3633 | 57357.1172 | 7104.0 | 0.1065 | 9.388 | 9.388 | 60678.2734 | 0 | |
58
+ | 0.2002002002002002 | 1984.4683 | 9672.7939 | 2192.0 | 0.0547 | 18.295 | 18.295 | 121910.375 | 200 | |
59
+ | 0.4004004004004004 | 1589.3818 | 7626.9956 | 2048.0 | 0.0545 | 18.334 | 18.334 | 74891.5859 | 400 | |
60
+ | 0.6006006006006006 | 1461.5446 | 7612.6294 | 1968.0 | 0.0554 | 18.063 | 18.063 | 75592.3516 | 600 | |
61
+ | 0.8008008008008008 | 1401.9131 | 7065.2969 | 1960.0 | 0.0547 | 18.283 | 18.283 | 59395.5664 | 800 | |
62
+ | | | | | | | | | | 2109.4855 |
 
 
 
 
 
 
63
 
64
  ### Framework versions
65
+ - Distily 0.1.0
66
  - Transformers 4.43.3
67
  - Pytorch 2.3.0
68
  - Datasets 2.20.0