floflodebilbao commited on
Commit
8dade40
·
verified ·
1 Parent(s): 68ebf98

End of training

Browse files
README.md CHANGED
@@ -22,21 +22,21 @@ should probably proofread and complete it, then remove this comment. -->
22
 
23
  This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on an unknown dataset.
24
  It achieves the following results on the evaluation set:
25
- - Loss: 1.1993
26
- - Rouge1: 0.3322
27
- - Rouge2: 0.125
28
- - Rougel: 0.2565
29
- - Rougelsum: 0.2574
30
- - Gen Len: 28.14
31
- - Bleu: 0.0621
32
- - Precisions: 0.1225
33
- - Brevity Penalty: 0.8355
34
- - Length Ratio: 0.8477
35
- - Translation Length: 1024.0
36
  - Reference Length: 1208.0
37
- - Precision: 0.8829
38
- - Recall: 0.878
39
- - F1: 0.8804
40
  - Hashcode: roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1)
41
 
42
  ## Model description
@@ -56,7 +56,7 @@ More information needed
56
  ### Training hyperparameters
57
 
58
  The following hyperparameters were used during training:
59
- - learning_rate: 0.003
60
  - train_batch_size: 1
61
  - eval_batch_size: 1
62
  - seed: 42
@@ -64,27 +64,22 @@ The following hyperparameters were used during training:
64
  - total_train_batch_size: 16
65
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
66
  - lr_scheduler_type: linear
67
- - num_epochs: 15
68
 
69
  ### Training results
70
 
71
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | Bleu | Precisions | Brevity Penalty | Length Ratio | Translation Length | Reference Length | Precision | Recall | F1 | Hashcode |
72
  |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|:------:|:----------:|:---------------:|:------------:|:------------------:|:----------------:|:---------:|:------:|:------:|:---------------------------------------------------------:|
73
- | 19.4075 | 1.0 | 7 | 7.8749 | 0.001 | 0.0 | 0.001 | 0.001 | 31.0 | 0.0 | 0.0 | 0.238 | 0.4106 | 496.0 | 1208.0 | 0.7126 | 0.8072 | 0.7569 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
74
- | 5.5271 | 2.0 | 14 | 4.1473 | 0.2045 | 0.0633 | 0.1651 | 0.1649 | 26.58 | 0.0276 | 0.076 | 0.7723 | 0.7947 | 960.0 | 1208.0 | 0.8611 | 0.857 | 0.8589 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
75
- | 3.6663 | 3.0 | 21 | 3.2820 | 0.2606 | 0.077 | 0.1974 | 0.1984 | 28.74 | 0.038 | 0.0817 | 0.8882 | 0.894 | 1080.0 | 1208.0 | 0.8597 | 0.8624 | 0.861 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
76
- | 2.8421 | 4.0 | 28 | 1.7852 | 0.2793 | 0.0975 | 0.2179 | 0.2176 | 27.3 | 0.0435 | 0.0968 | 0.848 | 0.8584 | 1037.0 | 1208.0 | 0.8737 | 0.8682 | 0.8709 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
77
- | 1.8176 | 5.0 | 35 | 1.2630 | 0.3385 | 0.1361 | 0.268 | 0.2687 | 27.06 | 0.0618 | 0.1275 | 0.8161 | 0.8311 | 1004.0 | 1208.0 | 0.8893 | 0.8794 | 0.8842 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
78
- | 1.2443 | 6.0 | 42 | 1.2196 | 0.3295 | 0.1287 | 0.2614 | 0.2626 | 28.1 | 0.0623 | 0.1201 | 0.8584 | 0.8675 | 1048.0 | 1208.0 | 0.8828 | 0.8789 | 0.8808 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
79
- | 1.1068 | 7.0 | 49 | 1.2106 | 0.3103 | 0.1176 | 0.2506 | 0.2509 | 27.96 | 0.0568 | 0.1145 | 0.8527 | 0.8626 | 1042.0 | 1208.0 | 0.8775 | 0.8738 | 0.8756 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
80
- | 1.0516 | 8.0 | 56 | 1.2039 | 0.3152 | 0.1158 | 0.247 | 0.2473 | 27.28 | 0.0529 | 0.1147 | 0.821 | 0.8353 | 1009.0 | 1208.0 | 0.8797 | 0.8746 | 0.8771 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
81
- | 0.9782 | 9.0 | 63 | 1.1941 | 0.3359 | 0.1305 | 0.2622 | 0.2616 | 28.0 | 0.0685 | 0.1276 | 0.8518 | 0.8618 | 1041.0 | 1208.0 | 0.8829 | 0.877 | 0.8799 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
82
- | 0.9384 | 10.0 | 70 | 1.1929 | 0.3341 | 0.1301 | 0.2582 | 0.2584 | 27.82 | 0.0676 | 0.1248 | 0.8432 | 0.8543 | 1032.0 | 1208.0 | 0.8815 | 0.8767 | 0.879 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
83
- | 0.9097 | 11.0 | 77 | 1.1998 | 0.3319 | 0.1244 | 0.2529 | 0.2531 | 28.32 | 0.0605 | 0.1192 | 0.8537 | 0.8634 | 1043.0 | 1208.0 | 0.8818 | 0.8773 | 0.8795 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
84
- | 0.8865 | 12.0 | 84 | 1.1992 | 0.3194 | 0.1101 | 0.2499 | 0.2505 | 28.52 | 0.0597 | 0.1145 | 0.8584 | 0.8675 | 1048.0 | 1208.0 | 0.8789 | 0.8754 | 0.8771 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
85
- | 0.8648 | 13.0 | 91 | 1.1958 | 0.3326 | 0.122 | 0.2536 | 0.253 | 28.5 | 0.065 | 0.1213 | 0.865 | 0.8733 | 1055.0 | 1208.0 | 0.8826 | 0.8774 | 0.8799 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
86
- | 0.85 | 14.0 | 98 | 1.1967 | 0.3343 | 0.1233 | 0.2544 | 0.254 | 28.36 | 0.0627 | 0.1211 | 0.8432 | 0.8543 | 1032.0 | 1208.0 | 0.8814 | 0.8773 | 0.8793 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
87
- | 0.8362 | 15.0 | 105 | 1.1993 | 0.3322 | 0.125 | 0.2565 | 0.2574 | 28.14 | 0.0621 | 0.1225 | 0.8355 | 0.8477 | 1024.0 | 1208.0 | 0.8829 | 0.878 | 0.8804 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
88
 
89
 
90
  ### Framework versions
 
22
 
23
  This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on an unknown dataset.
24
  It achieves the following results on the evaluation set:
25
+ - Loss: 1.2142
26
+ - Rouge1: 0.2852
27
+ - Rouge2: 0.0966
28
+ - Rougel: 0.2231
29
+ - Rougelsum: 0.2243
30
+ - Gen Len: 28.38
31
+ - Bleu: 0.0405
32
+ - Precisions: 0.0919
33
+ - Brevity Penalty: 0.8771
34
+ - Length Ratio: 0.8841
35
+ - Translation Length: 1068.0
36
  - Reference Length: 1208.0
37
+ - Precision: 0.8739
38
+ - Recall: 0.8718
39
+ - F1: 0.8728
40
  - Hashcode: roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1)
41
 
42
  ## Model description
 
56
  ### Training hyperparameters
57
 
58
  The following hyperparameters were used during training:
59
+ - learning_rate: 0.002
60
  - train_batch_size: 1
61
  - eval_batch_size: 1
62
  - seed: 42
 
64
  - total_train_batch_size: 16
65
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
66
  - lr_scheduler_type: linear
67
+ - num_epochs: 10
68
 
69
  ### Training results
70
 
71
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | Bleu | Precisions | Brevity Penalty | Length Ratio | Translation Length | Reference Length | Precision | Recall | F1 | Hashcode |
72
  |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|:------:|:----------:|:---------------:|:------------:|:------------------:|:----------------:|:---------:|:------:|:------:|:---------------------------------------------------------:|
73
+ | 22.2581 | 1.0 | 7 | 6.5353 | 0.084 | 0.0147 | 0.0714 | 0.0714 | 31.0 | 0.0047 | 0.0247 | 0.5558 | 0.63 | 761.0 | 1208.0 | 0.7817 | 0.8234 | 0.8014 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
74
+ | 6.7792 | 2.0 | 14 | 5.1759 | 0.1642 | 0.0129 | 0.13 | 0.1296 | 30.46 | 0.0 | 0.044 | 0.755 | 0.7806 | 943.0 | 1208.0 | 0.8343 | 0.8356 | 0.8349 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
75
+ | 4.5124 | 3.0 | 21 | 3.7445 | 0.2094 | 0.0517 | 0.1669 | 0.1666 | 28.9 | 0.021 | 0.0606 | 0.8336 | 0.846 | 1022.0 | 1208.0 | 0.8516 | 0.8529 | 0.8521 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
76
+ | 3.5042 | 4.0 | 28 | 3.1497 | 0.2314 | 0.0579 | 0.1774 | 0.1772 | 29.1 | 0.0317 | 0.0716 | 0.8537 | 0.8634 | 1043.0 | 1208.0 | 0.855 | 0.8584 | 0.8567 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
77
+ | 2.8574 | 5.0 | 35 | 2.0950 | 0.2342 | 0.0664 | 0.1895 | 0.1897 | 28.34 | 0.0325 | 0.0756 | 0.8584 | 0.8675 | 1048.0 | 1208.0 | 0.8581 | 0.8605 | 0.8593 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
78
+ | 2.0046 | 6.0 | 42 | 1.4599 | 0.2643 | 0.0843 | 0.2074 | 0.2081 | 28.18 | 0.036 | 0.0853 | 0.8678 | 0.8758 | 1058.0 | 1208.0 | 0.8665 | 0.8652 | 0.8658 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
79
+ | 1.4948 | 7.0 | 49 | 1.2786 | 0.2831 | 0.0921 | 0.2203 | 0.2208 | 28.3 | 0.0413 | 0.0893 | 0.8855 | 0.8916 | 1077.0 | 1208.0 | 0.8703 | 0.8681 | 0.8691 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
80
+ | 1.2731 | 8.0 | 56 | 1.2338 | 0.2802 | 0.096 | 0.2204 | 0.2221 | 28.26 | 0.0406 | 0.0893 | 0.8753 | 0.8825 | 1066.0 | 1208.0 | 0.8729 | 0.8705 | 0.8717 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
81
+ | 1.1977 | 9.0 | 63 | 1.2179 | 0.2834 | 0.0991 | 0.2233 | 0.2244 | 28.42 | 0.0409 | 0.0919 | 0.8725 | 0.88 | 1063.0 | 1208.0 | 0.8745 | 0.8722 | 0.8733 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
82
+ | 1.1717 | 10.0 | 70 | 1.2142 | 0.2852 | 0.0966 | 0.2231 | 0.2243 | 28.38 | 0.0405 | 0.0919 | 0.8771 | 0.8841 | 1068.0 | 1208.0 | 0.8739 | 0.8718 | 0.8728 | roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.1) |
 
 
 
 
 
83
 
84
 
85
  ### Framework versions
adapter_config.json CHANGED
@@ -24,10 +24,10 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "v",
28
- "q",
29
  "k",
30
- "o"
 
 
31
  ],
32
  "task_type": "SEQ_2_SEQ_LM",
33
  "trainable_token_indices": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
 
 
27
  "k",
28
+ "o",
29
+ "v",
30
+ "q"
31
  ],
32
  "task_type": "SEQ_2_SEQ_LM",
33
  "trainable_token_indices": null,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ea0e93c6ddf39e45a0c7a3f4fcdbe18e4d43e79842095d1d0c3d08ab9b256046
3
  size 7119264
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce2b38ef30b963d66e9ef85b5767a7582dd6d73c56fe4d698cc823db9c557e96
3
  size 7119264
runs/Jul29_12-11-53_tardis/events.out.tfevents.1753783914.tardis.18756.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81b1cf0457fb01b02808edbef452230b35362d1be9eee6b61813f820191eb47d
3
+ size 19105
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a9046a1a744ac3f151868e204dfa62e7f7fe1d46d80ae5bc3c4047275edd395b
3
  size 5905
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5272855a855b3568b6450f7b43bc4ab84b1cf3c36b67fb9ef04d9a9690734150
3
  size 5905