stefanbschneider commited on
Commit
edafaa7
·
verified ·
1 Parent(s): b2787eb

End of training

Browse files
Files changed (2) hide show
  1. README.md +36 -11
  2. generation_config.json +2 -2
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
- base_model: allenai/led-base-16384
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -14,10 +14,11 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # led-base-16384-lfqa
16
 
17
- This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 3.6340
20
- - Rouge2: 0.027
 
21
 
22
  ## Model description
23
 
@@ -40,21 +41,45 @@ The following hyperparameters were used during training:
40
  - train_batch_size: 2
41
  - eval_batch_size: 2
42
  - seed: 42
43
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
44
  - lr_scheduler_type: linear
45
  - num_epochs: 1
 
46
 
47
  ### Training results
48
 
49
- | Training Loss | Epoch | Step | Validation Loss | Rouge2 |
50
- |:-------------:|:------:|:----:|:---------------:|:------:|
51
- | 3.895 | 0.3906 | 50 | 3.7506 | 0.0266 |
52
- | 3.6803 | 0.7812 | 100 | 3.6340 | 0.027 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
 
55
  ### Framework versions
56
 
57
- - Transformers 4.48.1
58
- - Pytorch 2.5.1
59
  - Datasets 3.2.0
60
  - Tokenizers 0.21.0
 
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
+ base_model: stefanbschneider/led-base-16384-lfqa-ans-len-512
5
  tags:
6
  - generated_from_trainer
7
  model-index:
 
14
 
15
  # led-base-16384-lfqa
16
 
17
+ This model is a fine-tuned version of [stefanbschneider/led-base-16384-lfqa-ans-len-512](https://huggingface.co/stefanbschneider/led-base-16384-lfqa-ans-len-512) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 3.2615
20
+ - Rouge2: 0.0416
21
+ - Task: {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'}
22
 
23
  ## Model description
24
 
 
41
  - train_batch_size: 2
42
  - eval_batch_size: 2
43
  - seed: 42
44
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
  - lr_scheduler_type: linear
46
  - num_epochs: 1
47
+ - mixed_precision_training: Native AMP
48
 
49
  ### Training results
50
 
51
+ | Training Loss | Epoch | Step | Validation Loss | Rouge2 | Task |
52
+ |:-------------:|:------:|:-----:|:---------------:|:------:|:----------------------------------------------------------------------------------:|
53
+ | 3.4849 | 0.0395 | 2000 | 3.4233 | 0.0387 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
54
+ | 3.4744 | 0.0789 | 4000 | 3.4411 | 0.0398 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
55
+ | 3.4919 | 0.1184 | 6000 | 3.4251 | 0.0378 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
56
+ | 3.487 | 0.1578 | 8000 | 3.4200 | 0.0397 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
57
+ | 3.4443 | 0.1973 | 10000 | 3.3870 | 0.0376 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
58
+ | 3.4597 | 0.2367 | 12000 | 3.3914 | 0.0405 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
59
+ | 3.4525 | 0.2762 | 14000 | 3.3845 | 0.0398 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
60
+ | 3.4618 | 0.3156 | 16000 | 3.3752 | 0.0424 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
61
+ | 3.4573 | 0.3551 | 18000 | 3.3693 | 0.0421 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
62
+ | 3.4164 | 0.3945 | 20000 | 3.3640 | 0.042 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
63
+ | 3.4125 | 0.4340 | 22000 | 3.3544 | 0.0412 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
64
+ | 3.3828 | 0.4734 | 24000 | 3.3423 | 0.0409 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
65
+ | 3.3965 | 0.5129 | 26000 | 3.3436 | 0.0416 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
66
+ | 3.3993 | 0.5524 | 28000 | 3.3339 | 0.0384 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
67
+ | 3.3909 | 0.5918 | 30000 | 3.3122 | 0.0414 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
68
+ | 3.3745 | 0.6313 | 32000 | 3.3158 | 0.0416 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
69
+ | 3.3665 | 0.6707 | 34000 | 3.3038 | 0.0424 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
70
+ | 3.3351 | 0.7102 | 36000 | 3.2915 | 0.0435 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
71
+ | 3.3629 | 0.7496 | 38000 | 3.2955 | 0.0436 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
72
+ | 3.3465 | 0.7891 | 40000 | 3.2888 | 0.0395 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
73
+ | 3.3127 | 0.8285 | 42000 | 3.2800 | 0.0414 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
74
+ | 3.3385 | 0.8680 | 44000 | 3.2767 | 0.0413 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
75
+ | 3.2882 | 0.9074 | 46000 | 3.2685 | 0.0437 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
76
+ | 3.3162 | 0.9469 | 48000 | 3.2639 | 0.0412 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
77
+ | 3.3072 | 0.9863 | 50000 | 3.2615 | 0.0416 | {'name': 'Sequence-to-sequence Language Modeling', 'type': 'text2text-generation'} |
78
 
79
 
80
  ### Framework versions
81
 
82
+ - Transformers 4.48.3
83
+ - Pytorch 2.5.1+cu121
84
  - Datasets 3.2.0
85
  - Tokenizers 0.21.0
generation_config.json CHANGED
@@ -3,9 +3,9 @@
3
  "decoder_start_token_id": 0,
4
  "early_stopping": true,
5
  "length_penalty": 2.0,
6
- "max_length": 1024,
7
  "min_length": 100,
8
  "no_repeat_ngram_size": 3,
9
  "num_beams": 4,
10
- "transformers_version": "4.48.1"
11
  }
 
3
  "decoder_start_token_id": 0,
4
  "early_stopping": true,
5
  "length_penalty": 2.0,
6
+ "max_length": 512,
7
  "min_length": 100,
8
  "no_repeat_ngram_size": 3,
9
  "num_beams": 4,
10
+ "transformers_version": "4.48.3"
11
  }