lapp0
/

gpt2_model_card_distily_test

@@ -1,6 +1,7 @@
 ---
-license: mit
 base_model: gpt2
 tags:
 - Distily
 - generated_from_trainer
@@ -9,14 +10,17 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # gpt2_model_card_distily_test
-This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1664.0
 ## Model description
@@ -29,12 +33,16 @@ More information needed
 ## Training and evaluation data
 More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0001
 - train_batch_size: 1
 - eval_batch_size: 2
@@ -43,25 +51,18 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: cosine
 - num_epochs: 1.0
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| No log        | 0      | 0    | 6368.0          |
-| 2426.5        | 0.1001 | 100  | 1944.0          |
-| 2146.5        | 0.2002 | 200  | 1856.0          |
-| 2255.0        | 0.3003 | 300  | 1808.0          |
-| 2116.75       | 0.4004 | 400  | 1792.0          |
-| 1731.625      | 0.5005 | 500  | 1736.0          |
-| 2095.75       | 0.6006 | 600  | 1680.0          |
-| 1991.0        | 0.7007 | 700  | 1680.0          |
-| 1994.0        | 0.8008 | 800  | 1672.0          |
-| 2166.5        | 0.9009 | 900  | 1664.0          |
 ### Framework versions
 - Transformers 4.43.3
 - Pytorch 2.3.0
 - Datasets 2.20.0
-- Tokenizers 0.19.1

 ---
 base_model: gpt2
+library_name: distily
+license: mit
 tags:
 - Distily
 - generated_from_trainer
   results: []
 ---
 # gpt2_model_card_distily_test
+This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).
+The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- train_loss: 2109.4855
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment.
 ## Model description
 ## Training and evaluation data
 More information needed
+-->
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- distillation_strategy: logits_activations
+- loss_fn: reverse_kl
+- train_embeddings: True
 - learning_rate: 0.0001
 - train_batch_size: 1
 - eval_batch_size: 2
 - lr_scheduler_type: cosine
 - num_epochs: 1.0
+### Model Results
+| epoch | eval_enwikippl | eval_frwikippl | eval_loss | eval_runtime | eval_samples_per_second | eval_steps_per_second | eval_zhwikippl | step | train_loss |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| 0 | 61518.3633 | 57357.1172 | 7104.0 | 0.1065 | 9.388 | 9.388 | 60678.2734 | 0 |  |
+| 0.2002002002002002 | 1984.4683 | 9672.7939 | 2192.0 | 0.0547 | 18.295 | 18.295 | 121910.375 | 200 |  |
+| 0.4004004004004004 | 1589.3818 | 7626.9956 | 2048.0 | 0.0545 | 18.334 | 18.334 | 74891.5859 | 400 |  |
+| 0.6006006006006006 | 1461.5446 | 7612.6294 | 1968.0 | 0.0554 | 18.063 | 18.063 | 75592.3516 | 600 |  |
+| 0.8008008008008008 | 1401.9131 | 7065.2969 | 1960.0 | 0.0547 | 18.283 | 18.283 | 59395.5664 | 800 |  |
+|  |  |  |  |  |  |  |  |  | 2109.4855 |
 ### Framework versions
+- Distily 0.1.0
 - Transformers 4.43.3
 - Pytorch 2.3.0
 - Datasets 2.20.0