lapp0 commited on
Commit
08eca12
·
verified ·
1 Parent(s): 3518bf1

End of training

Browse files
README.md CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 1317.8882
20
- - eval_frwikippl: 6160.0112
21
- - eval_zhwikippl: 18720.5391
22
- - eval_loss: 9100.8643
23
- - eval_runtime: 21.7479
24
- - eval_samples_per_second: 45.982
25
- - eval_steps_per_second: 11.495
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -65,20 +65,20 @@ Peak GPU Memory: 4.5037 GB
65
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
66
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
67
  | **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
68
- | 0 | 0 | 56316.9727 | 57063.6406 | 338362.375 | 21.585 | 46.329 | 11.582 | 59895.3867 |
69
- | 500 | 0.0808 | 2572.3994 | 11417.5068 | 11543.2324 | 21.6004 | 46.295 | 11.574 | 41503.2422 |
70
- | 1000 | 0.1616 | 2063.0137 | 9092.1670 | 10577.4082 | 21.6631 | 46.161 | 11.54 | 35115.7461 |
71
- | 1500 | 0.2424 | 1871.0499 | 7938.2212 | 10358.4639 | 21.603 | 46.29 | 11.572 | 27292.1016 |
72
- | 2000 | 0.3232 | 1695.2686 | 7227.6602 | 9993.9844 | 21.6259 | 46.241 | 11.56 | 23182.9023 |
73
- | 2500 | 0.4040 | 1612.1370 | 6837.5381 | 9819.5840 | 21.6071 | 46.281 | 11.57 | 20554.8633 |
74
- | 3000 | 0.4848 | 1560.7086 | 6503.2227 | 9719.8721 | 21.6469 | 46.196 | 11.549 | 18931.7129 |
75
- | 3500 | 0.5657 | 1508.6647 | 6356.7939 | 9568.1279 | 21.5545 | 46.394 | 11.599 | 18264.9355 |
76
- | 4000 | 0.6465 | 1453.8334 | 6368.4561 | 9422.2725 | 21.6702 | 46.146 | 11.537 | 18620.8105 |
77
- | 4500 | 0.7273 | 1410.3822 | 6362.3979 | 9391.8076 | 21.643 | 46.204 | 11.551 | 19780.6777 |
78
- | 5000 | 0.8081 | 1377.3173 | 6155.8887 | 9252.1602 | 21.6684 | 46.15 | 11.538 | 19119.7227 |
79
- | 5500 | 0.8889 | 1357.9893 | 6214.3193 | 9214.1436 | 21.8033 | 45.865 | 11.466 | 18457.4102 |
80
- | 6000 | 0.9697 | 1323.8629 | 6020.0356 | 9141.9521 | 21.6876 | 46.109 | 11.527 | 17406.4805 |
81
- | 6187 | 0.9999 | 1317.8882 | 6160.0112 | 9100.8643 | 21.7479 | 45.982 | 11.495 | 18720.5391 |
82
 
83
  ### Framework versions
84
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 26003.4414
20
+ - eval_frwikippl: 43473.625
21
+ - eval_zhwikippl: 54798.5430
22
+ - eval_loss: 21585.9199
23
+ - eval_runtime: 21.7886
24
+ - eval_samples_per_second: 45.896
25
+ - eval_steps_per_second: 11.474
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
65
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
66
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
67
  | **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
68
+ | 0 | 0 | 55339.3672 | 57682.5742 | 331776.0 | 21.609 | 46.277 | 11.569 | 57080.2930 |
69
+ | 500 | 0.0808 | 53840.9336 | 57103.8711 | 31504.6406 | 21.8206 | 45.828 | 11.457 | 60063.5586 |
70
+ | 1000 | 0.1616 | 46110.3789 | 54346.3320 | 25851.3926 | 21.7004 | 46.082 | 11.521 | 58033.3359 |
71
+ | 1500 | 0.2424 | 39930.7539 | 50785.9883 | 24363.0078 | 21.7826 | 45.908 | 11.477 | 56878.6953 |
72
+ | 2000 | 0.3232 | 35821.5273 | 48514.4766 | 23500.8008 | 21.6304 | 46.231 | 11.558 | 56064.2539 |
73
+ | 2500 | 0.4040 | 33513.9102 | 47385.3516 | 23009.5352 | 22.046 | 45.36 | 11.34 | 55873.6484 |
74
+ | 3000 | 0.4848 | 31516.0898 | 46269.4453 | 22568.4473 | 21.8604 | 45.745 | 11.436 | 55709.7695 |
75
+ | 3500 | 0.5657 | 30457.4590 | 45776.25 | 22369.2793 | 21.741 | 45.996 | 11.499 | 55598.2578 |
76
+ | 4000 | 0.6465 | 29546.6035 | 45307.4453 | 22169.5996 | 21.7185 | 46.044 | 11.511 | 55524.0742 |
77
+ | 4500 | 0.7273 | 28461.1484 | 44691.9258 | 21980.1602 | 21.6611 | 46.166 | 11.541 | 55228.2812 |
78
+ | 5000 | 0.8081 | 27586.4121 | 44246.7188 | 21925.6328 | 21.7331 | 46.013 | 11.503 | 55025.875 |
79
+ | 5500 | 0.8889 | 26811.3066 | 43867.7734 | 21713.1523 | 21.755 | 45.966 | 11.492 | 54930.3984 |
80
+ | 6000 | 0.9697 | 26139.0703 | 43621.0156 | 21624.0645 | 21.6556 | 46.177 | 11.544 | 54864.4336 |
81
+ | 6187 | 0.9999 | 26003.4414 | 43473.625 | 21585.9199 | 21.7886 | 45.896 | 11.474 | 54798.5430 |
82
 
83
  ### Framework versions
84
  - Distily 0.2.0
runs/Aug10_06-50-39_93d6cbb3ad53/events.out.tfevents.1723276912.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:008a85dbbd7a24fdc998c3ca660036353b33486ab3679afffa30cc2226ed79c8
3
+ size 249