lapp0 commited on
Commit
8fad7d7
·
verified ·
1 Parent(s): 6671e3b

Training in progress, step 199

Browse files
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
- base_model: gpt2
3
- library_name: distily
4
  license: mit
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -9,23 +8,14 @@ model-index:
9
  results: []
10
  ---
11
 
12
- # gpt2_model_card_distily_test
13
-
14
- This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).
15
 
16
- The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 3251.3369
20
- - eval_frwikippl: 12842.3994
21
- - eval_zhwikippl: 91987.7734
22
- - eval_loss: 2288.0
23
- - eval_runtime: 0.0553
24
- - eval_samples_per_second: 18.087
25
- - eval_steps_per_second: 18.087
26
-
27
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
- should probably proofread and complete it, then remove this comment.
29
 
30
  ## Model description
31
 
@@ -38,16 +28,12 @@ More information needed
38
  ## Training and evaluation data
39
 
40
  More information needed
41
- -->
42
 
43
  ## Training procedure
44
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - distillation_strategy: logits_activations
49
- - loss_fn: reverse_kl
50
- - train_embeddings: True
51
  - learning_rate: 0.0001
52
  - train_batch_size: 1
53
  - eval_batch_size: 2
@@ -56,19 +42,19 @@ The following hyperparameters were used during training:
56
  - lr_scheduler_type: cosine
57
  - num_epochs: 1.0
58
 
59
- ### Resource Usage
60
- Peak GPU Memory: 1.2452GB
 
 
 
 
 
 
61
 
62
- ### Model Results
63
- | epoch | eval_enwikippl | eval_frwikippl | eval_loss | eval_runtime | eval_samples_per_second | eval_steps_per_second | eval_zhwikippl | step |
64
- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
- | 0 | 58331.5781 | 58190.1172 | 6944.0 | 0.0763 | 13.107 | 13.107 | 54568.5117 | 0 |
66
- | 0.5025 | 2778.4973 | 13039.9355 | 2080.0 | 0.0561 | 17.833 | 17.833 | 100748.5312 | 100 |
67
- | 0.7538 | 2581.9565 | 12580.9199 | 2048.0 | 0.0551 | 18.153 | 18.153 | 110134.0156 | 150 |
68
- | 0.2513 | 3251.3369 | 12842.3994 | 2288.0 | 0.0553 | 18.087 | 18.087 | 91987.7734 | 50 |
69
 
70
  ### Framework versions
71
- - Distily 0.1.0
72
  - Transformers 4.43.3
73
  - Pytorch 2.3.0
74
  - Datasets 2.20.0
 
 
1
  ---
 
 
2
  license: mit
3
+ base_model: gpt2
4
  tags:
5
  - generated_from_trainer
6
  model-index:
 
8
  results: []
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
 
13
 
14
+ # gpt2_model_card_distily_test
15
 
16
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 2608.0
 
 
 
 
 
 
 
 
 
19
 
20
  ## Model description
21
 
 
28
  ## Training and evaluation data
29
 
30
  More information needed
 
31
 
32
  ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
 
 
 
37
  - learning_rate: 0.0001
38
  - train_batch_size: 1
39
  - eval_batch_size: 2
 
42
  - lr_scheduler_type: cosine
43
  - num_epochs: 1.0
44
 
45
+ ### Training results
46
+
47
+ | Training Loss | Epoch | Step | Validation Loss |
48
+ |:-------------:|:------:|:----:|:---------------:|
49
+ | No log | 0 | 0 | 7104.0 |
50
+ | 2828.0 | 0.2513 | 50 | 2848.0 |
51
+ | 2543.0 | 0.5025 | 100 | 2656.0 |
52
+ | 2358.5 | 0.7538 | 150 | 2608.0 |
53
 
 
 
 
 
 
 
 
54
 
55
  ### Framework versions
56
+
57
  - Transformers 4.43.3
58
  - Pytorch 2.3.0
59
  - Datasets 2.20.0
60
+ - Tokenizers 0.19.1
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c5d0011739161d9c17776485506857ef1dd2b62ae6c0d61b8dbc277dce7d2af
3
  size 248894656
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ede6bc4ec7bf5194f9241b0f94910ac13a64a959aaf3f15ce8d52043d2e1d58f
3
  size 248894656
runs/Aug05_21-36-23_232a0f8c3879/events.out.tfevents.1722893933.232a0f8c3879 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0740267c12301bcb8f1d119e4003ddac61f4ee82cdae812be3a4cd54eabc02c6
3
+ size 10697
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6e161dcc8f36f69faff36b64a6213ba5bddc8cd05302c5231a512aa0a753e02
3
  size 907106628
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c2a5c86217f70aec074ac88ad817fb1abce1a61c9c752ea379eebb184f8c426
3
  size 907106628