End of training
Browse files
README.md
CHANGED
@@ -16,9 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
|
|
|
|
|
|
|
|
22 |
|
23 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
24 |
should probably proofread and complete it, then remove this comment.
|
@@ -53,15 +57,19 @@ The following hyperparameters were used during training:
|
|
53 |
- num_epochs: 1.0
|
54 |
|
55 |
### Resource Usage
|
56 |
-
Peak GPU Memory: 1.
|
57 |
|
58 |
### Model Results
|
59 |
-
|
|
|
|
|
60 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
61 |
-
| |
|
62 |
-
|
|
63 |
-
|
|
64 |
-
|
|
|
|
|
|
65 |
|
66 |
### Framework versions
|
67 |
- Distily 0.1.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 18151.8379
|
20 |
+
- eval_frwikippl: 38363.0352
|
21 |
+
- eval_zhwikippl: 56660.7266
|
22 |
+
- eval_loss: 0.0004
|
23 |
+
- eval_runtime: 0.0556
|
24 |
+
- eval_samples_per_second: 17.976
|
25 |
+
- eval_steps_per_second: 17.976
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
57 |
- num_epochs: 1.0
|
58 |
|
59 |
### Resource Usage
|
60 |
+
Peak GPU Memory: 1.2477 GB
|
61 |
|
62 |
### Model Results
|
63 |
+
`eval_` metrics:
|
64 |
+
|
65 |
+
| enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl | epoch | step |
|
66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
67 |
+
| | | | | | | | | **teacher eval** |
|
68 |
+
| | | | | | | | 0 | 0 |
|
69 |
+
| | | | | | | | 0.3030 | 30 |
|
70 |
+
| | | | | | | | 0.6061 | 60 |
|
71 |
+
| | | | | | | | 0.9091 | 90 |
|
72 |
+
| | | | | | | | 1.0 | 99 |
|
73 |
|
74 |
### Framework versions
|
75 |
- Distily 0.1.0
|
runs/Aug05_22-09-56_232a0f8c3879/events.out.tfevents.1722896045.232a0f8c3879
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:732d1756957d42440683dbc12e73edd0a722c827f4a31f505d9b6f654128d9e6
|
3 |
+
size 245
|