End of training
Browse files
README.md
CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 21.
|
24 |
-
- eval_samples_per_second: 45.
|
25 |
-
- eval_steps_per_second: 11.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -65,20 +65,20 @@ Peak GPU Memory: 4.5037 GB
|
|
65 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
67 |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
|
68 |
-
| 0 | 0 |
|
69 |
-
| 500 | 0.0808 |
|
70 |
-
| 1000 | 0.1616 |
|
71 |
-
| 1500 | 0.2424 |
|
72 |
-
| 2000 | 0.3232 |
|
73 |
-
| 2500 | 0.4040 |
|
74 |
-
| 3000 | 0.4848 |
|
75 |
-
| 3500 | 0.5657 |
|
76 |
-
| 4000 | 0.6465 |
|
77 |
-
| 4500 | 0.7273 |
|
78 |
-
| 5000 | 0.8081 |
|
79 |
-
| 5500 | 0.8889 |
|
80 |
-
| 6000 | 0.9697 |
|
81 |
-
| 6187 | 0.9999 |
|
82 |
|
83 |
### Framework versions
|
84 |
- Distily 0.2.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 26003.4414
|
20 |
+
- eval_frwikippl: 43473.625
|
21 |
+
- eval_zhwikippl: 54798.5430
|
22 |
+
- eval_loss: 21585.9199
|
23 |
+
- eval_runtime: 21.7886
|
24 |
+
- eval_samples_per_second: 45.896
|
25 |
+
- eval_steps_per_second: 11.474
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
65 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
67 |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
|
68 |
+
| 0 | 0 | 55339.3672 | 57682.5742 | 331776.0 | 21.609 | 46.277 | 11.569 | 57080.2930 |
|
69 |
+
| 500 | 0.0808 | 53840.9336 | 57103.8711 | 31504.6406 | 21.8206 | 45.828 | 11.457 | 60063.5586 |
|
70 |
+
| 1000 | 0.1616 | 46110.3789 | 54346.3320 | 25851.3926 | 21.7004 | 46.082 | 11.521 | 58033.3359 |
|
71 |
+
| 1500 | 0.2424 | 39930.7539 | 50785.9883 | 24363.0078 | 21.7826 | 45.908 | 11.477 | 56878.6953 |
|
72 |
+
| 2000 | 0.3232 | 35821.5273 | 48514.4766 | 23500.8008 | 21.6304 | 46.231 | 11.558 | 56064.2539 |
|
73 |
+
| 2500 | 0.4040 | 33513.9102 | 47385.3516 | 23009.5352 | 22.046 | 45.36 | 11.34 | 55873.6484 |
|
74 |
+
| 3000 | 0.4848 | 31516.0898 | 46269.4453 | 22568.4473 | 21.8604 | 45.745 | 11.436 | 55709.7695 |
|
75 |
+
| 3500 | 0.5657 | 30457.4590 | 45776.25 | 22369.2793 | 21.741 | 45.996 | 11.499 | 55598.2578 |
|
76 |
+
| 4000 | 0.6465 | 29546.6035 | 45307.4453 | 22169.5996 | 21.7185 | 46.044 | 11.511 | 55524.0742 |
|
77 |
+
| 4500 | 0.7273 | 28461.1484 | 44691.9258 | 21980.1602 | 21.6611 | 46.166 | 11.541 | 55228.2812 |
|
78 |
+
| 5000 | 0.8081 | 27586.4121 | 44246.7188 | 21925.6328 | 21.7331 | 46.013 | 11.503 | 55025.875 |
|
79 |
+
| 5500 | 0.8889 | 26811.3066 | 43867.7734 | 21713.1523 | 21.755 | 45.966 | 11.492 | 54930.3984 |
|
80 |
+
| 6000 | 0.9697 | 26139.0703 | 43621.0156 | 21624.0645 | 21.6556 | 46.177 | 11.544 | 54864.4336 |
|
81 |
+
| 6187 | 0.9999 | 26003.4414 | 43473.625 | 21585.9199 | 21.7886 | 45.896 | 11.474 | 54798.5430 |
|
82 |
|
83 |
### Framework versions
|
84 |
- Distily 0.2.0
|
runs/Aug10_06-50-39_93d6cbb3ad53/events.out.tfevents.1723276912.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:008a85dbbd7a24fdc998c3ca660036353b33486ab3679afffa30cc2226ed79c8
|
3 |
+
size 249
|