Training in progress, step 199

Browse files

Files changed (4) hide show

README.md +16 -30
model.safetensors +1 -1
runs/Aug05_21-36-23_232a0f8c3879/events.out.tfevents.1722893933.232a0f8c3879 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
-base_model: gpt2
-library_name: distily
 license: mit
 tags:
 - generated_from_trainer
 model-index:
@@ -9,23 +8,14 @@ model-index:
   results: []
 ---
-# gpt2_model_card_distily_test
-This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).
-The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 3251.3369
-- eval_frwikippl: 12842.3994
-- eval_zhwikippl: 91987.7734
-- eval_loss: 2288.0
-- eval_runtime: 0.0553
-- eval_samples_per_second: 18.087
-- eval_steps_per_second: 18.087
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment.
 ## Model description
@@ -38,16 +28,12 @@ More information needed
 ## Training and evaluation data
 More information needed
--->
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- distillation_strategy: logits_activations
-- loss_fn: reverse_kl
-- train_embeddings: True
 - learning_rate: 0.0001
 - train_batch_size: 1
 - eval_batch_size: 2
@@ -56,19 +42,19 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: cosine
 - num_epochs: 1.0
-### Resource Usage
-Peak GPU Memory: 1.2452GB
-### Model Results
-| epoch | eval_enwikippl | eval_frwikippl | eval_loss | eval_runtime | eval_samples_per_second | eval_steps_per_second | eval_zhwikippl | step |
-| --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| 0 | 58331.5781 | 58190.1172 | 6944.0 | 0.0763 | 13.107 | 13.107 | 54568.5117 | 0 |
-| 0.5025 | 2778.4973 | 13039.9355 | 2080.0 | 0.0561 | 17.833 | 17.833 | 100748.5312 | 100 |
-| 0.7538 | 2581.9565 | 12580.9199 | 2048.0 | 0.0551 | 18.153 | 18.153 | 110134.0156 | 150 |
-| 0.2513 | 3251.3369 | 12842.3994 | 2288.0 | 0.0553 | 18.087 | 18.087 | 91987.7734 | 50 |
 ### Framework versions
-- Distily 0.1.0
 - Transformers 4.43.3
 - Pytorch 2.3.0
 - Datasets 2.20.0

 ---
 license: mit
+base_model: gpt2
 tags:
 - generated_from_trainer
 model-index:
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt2_model_card_distily_test
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2608.0
 ## Model description
 ## Training and evaluation data
 More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0001
 - train_batch_size: 1
 - eval_batch_size: 2
 - lr_scheduler_type: cosine
 - num_epochs: 1.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| No log        | 0      | 0    | 7104.0          |
+| 2828.0        | 0.2513 | 50   | 2848.0          |
+| 2543.0        | 0.5025 | 100  | 2656.0          |
+| 2358.5        | 0.7538 | 150  | 2608.0          |
 ### Framework versions
 - Transformers 4.43.3
 - Pytorch 2.3.0
 - Datasets 2.20.0
+- Tokenizers 0.19.1

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0c5d0011739161d9c17776485506857ef1dd2b62ae6c0d61b8dbc277dce7d2af
 size 248894656

 version https://git-lfs.github.com/spec/v1
+oid sha256:ede6bc4ec7bf5194f9241b0f94910ac13a64a959aaf3f15ce8d52043d2e1d58f
 size 248894656

runs/Aug05_21-36-23_232a0f8c3879/events.out.tfevents.1722893933.232a0f8c3879 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0740267c12301bcb8f1d119e4003ddac61f4ee82cdae812be3a4cd54eabc02c6
+size 10697

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a6e161dcc8f36f69faff36b64a6213ba5bddc8cd05302c5231a512aa0a753e02
 size 907106628

 version https://git-lfs.github.com/spec/v1
+oid sha256:1c2a5c86217f70aec074ac88ad817fb1abce1a61c9c752ea379eebb184f8c426
 size 907106628