lapp0 commited on
Commit
069835c
·
verified ·
1 Parent(s): 8f5a054

End of training

Browse files
README.md CHANGED
@@ -4,7 +4,7 @@ library_name: Distily
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
- - name: distily_TinyStories-33M
8
  results: []
9
  ---
10
 
@@ -15,13 +15,13 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 5885.9341
19
- - eval_frwikippl: 24294.9414
20
- - eval_zhwikippl: 264331.3438
21
- - eval_loss: 0.3987
22
- - eval_runtime: 51.5838
23
- - eval_samples_per_second: 48.465
24
- - eval_steps_per_second: 6.068
25
 
26
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
27
  should probably proofread and complete it, then remove this comment.
@@ -44,7 +44,7 @@ More information needed
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
- - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
48
  - train_embeddings: True
49
  - learning_rate: 4e-05
50
  - train_batch_size: 8
@@ -55,44 +55,44 @@ The following hyperparameters were used during training:
55
  - num_epochs: 1.0
56
 
57
  ### Resource Usage
58
- Peak GPU Memory: 8.1416 GB
59
 
60
  ### Eval-Phase Metrics
61
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
62
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
63
  | **teacher eval** | | 20633.1680 | 131577.2812 | | | | | 7615.4468 |
64
- | 0 | 0 | 55266.375 | 57180.4375 | 6.2843 | 26.4237 | 94.612 | 11.845 | 56806.5430 |
65
- | 1000 | 0.0323 | 11414.3389 | 87921.1172 | 0.7142 | 26.3405 | 94.911 | 11.883 | 611931.1875 |
66
- | 2000 | 0.0646 | 8814.8682 | 53295.2305 | 0.6287 | 51.0412 | 48.98 | 6.132 | 507315.5625 |
67
- | 3000 | 0.0970 | 8020.6040 | 41652.3320 | 0.5662 | 29.4187 | 84.98 | 10.639 | 268242.625 |
68
- | 4000 | 0.1293 | 7153.7090 | 33178.5977 | 0.5197 | 40.0478 | 62.425 | 7.816 | 315367.9062 |
69
- | 5000 | 0.1616 | 6865.2617 | 31042.1875 | 0.4833 | 36.655 | 68.203 | 8.539 | 372857.25 |
70
- | 6000 | 0.1939 | 6828.5781 | 30924.2324 | 0.4539 | 47.1811 | 52.987 | 6.634 | 379690.5 |
71
- | 7000 | 0.2263 | 6329.1855 | 28375.3984 | 0.4331 | 51.6027 | 48.447 | 6.066 | 325812.875 |
72
- | 8000 | 0.2586 | 6229.7119 | 28592.2773 | 0.4123 | 51.6184 | 48.432 | 6.064 | 318159.5 |
73
- | 9000 | 0.2909 | 5885.9341 | 24294.9414 | 0.3987 | 51.5838 | 48.465 | 6.068 | 264331.3438 |
74
- | 10000 | 0.3232 | 5634.5898 | 24401.3828 | 0.3856 | 51.6233 | 48.428 | 6.063 | 248118.4062 |
75
- | 11000 | 0.3555 | 5849.9346 | 26113.8555 | 0.3761 | 51.5949 | 48.454 | 6.066 | 255583.9844 |
76
- | 12000 | 0.3879 | 5588.8325 | 23138.0430 | 0.3666 | 51.5384 | 48.508 | 6.073 | 255106.6875 |
77
- | 13000 | 0.4202 | 5498.4355 | 23102.1699 | 0.3618 | 51.6778 | 48.377 | 6.057 | 244239.3125 |
78
- | 14000 | 0.4525 | 5495.8716 | 24775.8398 | 0.3530 | 51.4537 | 48.587 | 6.083 | 271776.25 |
79
- | 15000 | 0.4848 | 5449.1309 | 23173.9512 | 0.3490 | 51.6347 | 48.417 | 6.062 | 235716.0625 |
80
- | 16000 | 0.5172 | 5464.8057 | 25348.3184 | 0.3430 | 48.3546 | 51.701 | 6.473 | 305992.3125 |
81
- | 17000 | 0.5495 | 5289.8618 | 23652.6602 | 0.3426 | 45.4673 | 54.985 | 6.884 | 290930.0625 |
82
- | 18000 | 0.5818 | 5362.6548 | 23393.9375 | 0.3378 | 42.8681 | 58.318 | 7.301 | 237739.0938 |
83
- | 19000 | 0.6141 | 5970.6357 | 32165.1016 | 0.3332 | 38.4757 | 64.976 | 8.135 | 492760.0312 |
84
- | 20000 | 0.6465 | 5680.7217 | 30225.7988 | 0.3322 | 31.9943 | 78.139 | 9.783 | 391742.4062 |
85
- | 21000 | 0.6788 | 5494.1685 | 27750.1914 | 0.3288 | 49.7191 | 50.283 | 6.295 | 288762.6875 |
86
- | 22000 | 0.7111 | 5693.0815 | 24919.4883 | 0.3272 | 49.6244 | 50.378 | 6.307 | 263274.4375 |
87
- | 23000 | 0.7434 | 5303.4346 | 25441.4375 | 0.3230 | 50.6137 | 49.394 | 6.184 | 261801.9844 |
88
- | 24000 | 0.7757 | 5458.4463 | 26499.6543 | 0.3217 | 51.4227 | 48.617 | 6.087 | 229626.5781 |
89
- | 25000 | 0.8081 | 5728.1162 | 28263.5859 | 0.3203 | 51.6717 | 48.382 | 6.057 | 258605.3594 |
90
- | 26000 | 0.8404 | 5226.1689 | 23493.1152 | 0.3186 | 51.4811 | 48.562 | 6.08 | 180660.6719 |
91
- | 27000 | 0.8727 | 5192.1890 | 22039.3262 | 0.3165 | 51.6376 | 48.414 | 6.061 | 194013.875 |
92
- | 28000 | 0.9050 | 5418.7476 | 22450.2344 | 0.3169 | 51.6539 | 48.399 | 6.06 | 182503.5312 |
93
- | 29000 | 0.9374 | 5170.8613 | 23860.3691 | 0.3141 | 51.4944 | 48.549 | 6.078 | 197516.9531 |
94
- | 30000 | 0.9697 | 5569.3379 | 25081.6641 | 0.3130 | 51.3337 | 48.701 | 6.097 | 160202.3281 |
95
- | 30938 | 1.0 | 5306.7280 | 25078.125 | 0.3130 | 51.5266 | 48.519 | 6.075 | 179410.5625 |
96
 
97
  ### Framework versions
98
  - Distily 0.2.0
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
+ - name: distily_TinyStories-33M_hs_attn
8
  results: []
9
  ---
10
 
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 5505.2720
19
+ - eval_frwikippl: 21773.6699
20
+ - eval_zhwikippl: 149216.0938
21
+ - eval_loss: 1.1383
22
+ - eval_runtime: 51.1413
23
+ - eval_samples_per_second: 48.884
24
+ - eval_steps_per_second: 6.12
25
 
26
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
27
  should probably proofread and complete it, then remove this comment.
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
+ - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=5000.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=500.0, loss_fn=jsd, layer_mapper=None, projector=None))
48
  - train_embeddings: True
49
  - learning_rate: 4e-05
50
  - train_batch_size: 8
 
55
  - num_epochs: 1.0
56
 
57
  ### Resource Usage
58
+ Peak GPU Memory: 8.2949 GB
59
 
60
  ### Eval-Phase Metrics
61
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
62
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
63
  | **teacher eval** | | 20633.1680 | 131577.2812 | | | | | 7615.4468 |
64
+ | 0 | 0 | 57409.7656 | 57878.0820 | 11.7972 | 40.6672 | 61.475 | 7.697 | 56928.0781 |
65
+ | 1000 | 0.0323 | 10372.9512 | 76930.4531 | 1.9053 | 41.7953 | 59.815 | 7.489 | 858113.625 |
66
+ | 2000 | 0.0646 | 8020.6040 | 46711.9688 | 1.6472 | 41.0642 | 60.88 | 7.622 | 367518.3125 |
67
+ | 3000 | 0.0970 | 8157.5376 | 45240.3945 | 1.5278 | 45.4508 | 55.005 | 6.887 | 515510.5625 |
68
+ | 4000 | 0.1293 | 7411.5596 | 36822.6484 | 1.4337 | 51.1158 | 48.909 | 6.123 | 421034.4688 |
69
+ | 5000 | 0.1616 | 6422.7583 | 28339.4023 | 1.3515 | 51.1748 | 48.852 | 6.116 | 267027.4375 |
70
+ | 6000 | 0.1939 | 6131.3276 | 24695.6113 | 1.2750 | 50.9734 | 49.045 | 6.14 | 194273.2656 |
71
+ | 7000 | 0.2263 | 5802.4341 | 23374.1562 | 1.2199 | 50.8571 | 49.157 | 6.155 | 168406.4688 |
72
+ | 8000 | 0.2586 | 5621.9170 | 21168.1855 | 1.1773 | 51.0097 | 49.01 | 6.136 | 164012.0469 |
73
+ | 9000 | 0.2909 | 5505.2720 | 21773.6699 | 1.1383 | 51.1413 | 48.884 | 6.12 | 149216.0938 |
74
+ | 10000 | 0.3232 | 5617.5493 | 21623.7461 | 1.1134 | 51.0853 | 48.938 | 6.127 | 148977.0625 |
75
+ | 11000 | 0.3555 | 5438.9810 | 21305.9277 | 1.0901 | 51.2289 | 48.801 | 6.11 | 148262.7188 |
76
+ | 12000 | 0.3879 | 5601.4360 | 22292.5059 | 1.0718 | 51.1771 | 48.85 | 6.116 | 156941.4062 |
77
+ | 13000 | 0.4202 | 5323.2368 | 21323.9785 | 1.0547 | 50.814 | 49.199 | 6.16 | 145089.7812 |
78
+ | 14000 | 0.4525 | 5399.0068 | 21468.7930 | 1.0443 | 50.9066 | 49.11 | 6.149 | 147118.75 |
79
+ | 15000 | 0.4848 | 5341.0449 | 20151.6465 | 1.0364 | 51.0013 | 49.018 | 6.137 | 134312.3438 |
80
+ | 16000 | 0.5172 | 5234.6987 | 20021.3477 | 1.0292 | 51.7235 | 48.334 | 6.051 | 136299.75 |
81
+ | 17000 | 0.5495 | 5317.8687 | 21308.9355 | 1.0156 | 54.7044 | 45.7 | 5.722 | 149495.2656 |
82
+ | 18000 | 0.5818 | 5521.5405 | 20827.6855 | 1.0137 | 41.4159 | 60.363 | 7.557 | 141984.7344 |
83
+ | 19000 | 0.6141 | 5249.7568 | 20254.2051 | 1.0055 | 42.1847 | 59.263 | 7.42 | 124202.625 |
84
+ | 20000 | 0.6465 | 5582.7598 | 21764.4727 | 0.9982 | 46.3033 | 53.992 | 6.76 | 149495.2656 |
85
+ | 21000 | 0.6788 | 5232.6621 | 20262.7637 | 0.9935 | 48.1287 | 51.944 | 6.503 | 145128.5312 |
86
+ | 22000 | 0.7111 | 5320.3491 | 21332.9902 | 0.9854 | 50.6681 | 49.341 | 6.177 | 155605.7656 |
87
+ | 23000 | 0.7434 | 5032.2212 | 19788.3945 | 0.9876 | 50.9899 | 49.029 | 6.138 | 141417.0312 |
88
+ | 24000 | 0.7757 | 5318.2793 | 22064.2031 | 0.9832 | 50.912 | 49.104 | 6.148 | 152560.7188 |
89
+ | 25000 | 0.8081 | 5365.5708 | 21906.0957 | 0.9779 | 51.1379 | 48.887 | 6.121 | 154034.5156 |
90
+ | 26000 | 0.8404 | 5328.6157 | 22267.3691 | 0.9740 | 51.1115 | 48.913 | 6.124 | 154983.75 |
91
+ | 27000 | 0.8727 | 5565.8813 | 22663.3496 | 0.9714 | 32.781 | 76.264 | 9.548 | 152397.8594 |
92
+ | 28000 | 0.9050 | 5278.7847 | 20380.2637 | 0.9723 | 27.108 | 92.224 | 11.546 | 141190.6406 |
93
+ | 29000 | 0.9374 | 5302.2002 | 20637.6562 | 0.9657 | 30.8728 | 80.977 | 10.138 | 139914.2969 |
94
+ | 30000 | 0.9697 | 5366.4053 | 22920.4629 | 0.9633 | 27.0433 | 92.444 | 11.574 | 160202.3281 |
95
+ | 30938 | 1.0 | 5286.9868 | 20498.4277 | 0.9628 | 27.0346 | 92.474 | 11.578 | 145051.0469 |
96
 
97
  ### Framework versions
98
  - Distily 0.2.0
runs/Aug15_11-10-45_77e473d64567/events.out.tfevents.1723731347.77e473d64567 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bb4b6bd610761dd2cdd28be0d99c6dc3abb383683d8b94a8978002fe5798e5d
3
+ size 253