alex-shvets commited on
Commit
0121cef
·
verified ·
1 Parent(s): 857ce58

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,125 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: roberta-large-emopillars-contextless
4
+ metrics:
5
+ - f1
6
+ model-index:
7
+ - name: roberta-large-emopillars-contextless-isear
8
+ results: []
9
+ ---
10
+
11
+ # roberta-large-emopillars-contextless-isear
12
+
13
+ This model is a fine-tuned version of [roberta-large-emopillars-contextless](https://huggingface.co/alex-shvets/roberta-large-emopillars-contextless) on [ISEAR dateset](https://paperswithcode.com/dataset/isear).
14
+
15
+ <img src="https://huggingface.co/datasets/alex-shvets/images/resolve/main/emopillars_color_2.png" width="450">
16
+
17
+ ## Model description
18
+
19
+ The model is a multi-label classifier over 28 emotional classes for a context-less scenario, fine-tuned on a dataset of 7 classes (_anger_, _disgust_, _fear_, _sadness_, _joy_, _shame_, _guilt_). It detects emotions in the entire input (including context if provided).
20
+
21
+ ## How to use
22
+
23
+ Here is how to use this model:
24
+
25
+ ```python
26
+ >>> import torch
27
+ >>> from transformers import pipeline
28
+ >>> model_name = "roberta-large-emopillars-contextless-isear"
29
+ >>> threshold = 0.5
30
+ >>> emotions = ["admiration", "amusement", "anger", "annoyance", "approval", "caring", "confusion",
31
+ >>> "curiosity", "desire", "disappointment", "disapproval", "disgust", "embarrassment",
32
+ >>> "excitement", "fear", "gratitude", "grief", "joy", "love", "nervousness", "optimism",
33
+ >>> "pride", "realization", "relief", "remorse", "sadness", "surprise", "neutral"]
34
+ >>> label_to_emotion = dict(zip(list(range(len(emotions))), emotions))
35
+ >>> emotion_to_isear = {
36
+ >>> "anger": "anger",
37
+ >>> "disgust": "disgust",
38
+ >>> "fear": "fear",
39
+ >>> "sadness": "sadness",
40
+ >>> "joy": "joy",
41
+ >>> "embarrassment": "shame",
42
+ >>> "remorse": "guilt"
43
+ >>> }
44
+ >>> device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
45
+ >>> pipe = pipeline("text-classification", model=model_name, truncation=True,
46
+ >>> return_all_scores=True, device=-1 if device.type=="cpu" else 0)
47
+ >>> # input in a format f"{text}"
48
+ >>> utterances = ["Ok is it just me or is anyone else getting goosebumps too???",
49
+ >>> "Don’t know what to do",
50
+ >>> "When a car is overtaking another and I am forced to drive off the road."]
51
+ >>> outcome = pipe(utterances)
52
+ >>> dominant_classes = [
53
+ >>> [prediction for prediction in example if prediction['score'] >= threshold and
54
+ >>> label_to_emotion[int(prediction['label'])] in emotion_to_isear]
55
+ >>> for example in outcome
56
+ >>> ]
57
+ >>> for example in dominant_classes:
58
+ >>> print(", ".join([
59
+ >>> "%s: %.2lf" % (emotion_to_isear[label_to_emotion[int(prediction['label'])]], prediction['score'])
60
+ >>> for prediction in sorted(example, key=lambda x: x['score'], reverse=True)
61
+ >>> ]))
62
+ fear: 0.90
63
+ sadness: 0.91
64
+ anger: 1.00
65
+ ```
66
+
67
+ ## Training data
68
+
69
+ The training data consists of 6013 samples of the [ISEAR dataset](https://paperswithcode.com/dataset/isear).
70
+
71
+ ## Training procedure
72
+
73
+ ### Training hyperparameters
74
+
75
+ The following hyperparameters were used during training:
76
+ - learning_rate: 2e-05
77
+ - train_batch_size: 4
78
+ - eval_batch_size: 8
79
+ - seed: 752
80
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
81
+ - lr_scheduler_type: linear
82
+ - num_epochs: 8.0
83
+
84
+ ### Framework versions
85
+
86
+ - Transformers 4.45.0.dev0
87
+ - Pytorch 2.4.0a0+gite3b9b71
88
+ - Datasets 2.21.0
89
+ - Tokenizers 0.19.1
90
+
91
+ ## Evaluation
92
+
93
+ Scores for the evaluation on the test split (20% of the ISEAR dataset):
94
+
95
+ | **class** | **precision**| **recall** | **f1-score** | **support** |
96
+ | :--- | :---: | :---: | :---: | ---: |
97
+ | anger | 0.67 | 0.65 | 0.66 | 209 |
98
+ | disgust | 0.75 | 0.72 | 0.74 | 232 |
99
+ | fear | 0.88 | 0.81 | 0.84 | 205 |
100
+ | sadness | 0.71 | 0.78 | 0.74 | 198 |
101
+ | joy | 0.93 | 0.93 | 0.93 | 219 |
102
+ | shame | 0.64 | 0.66 | 0.65 | 222 |
103
+ | guilt | 0.75 | 0.72 | 0.73 | 218 |
104
+ | **micro avg** | 0.76 | 0.75 | 0.76 | 1503 |
105
+ | **macro avg** | 0.76 | 0.75 | 0.76 | 1503 |
106
+ | **weighted avg** | 0.76 | 0.75 | 0.76 | 1503 |
107
+ | **samples avg** | 0.75 | 0.75 | 0.75 | 1503 |
108
+
109
+
110
+ For more details on the evaluation, please visit our [GitHub repository](https://github.com/alex-shvets/emopillars).
111
+
112
+
113
+ ## Disclaimer
114
+
115
+ <details>
116
+
117
+ <summary>Click to expand</summary>
118
+
119
+ The model published in this repository is intended for a generalist purpose and is available to third parties. This model may have bias and/or any other undesirable distortions.
120
+
121
+ When third parties deploy or provide systems and/or services to other parties using this model (or using systems based on this model) or become users of the model, they should note that it is their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
122
+
123
+ In no event shall the creator of the model be liable for any results arising from the use made by third parties of this model.
124
+
125
+ </details>
config.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "roberta_mistfull_64batch_10epochs752",
3
+ "architectures": [
4
+ "RobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "finetuning_task": "text-classification",
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 1024,
14
+ "id2label": {
15
+ "0": "0",
16
+ "1": "1",
17
+ "2": "10",
18
+ "3": "11",
19
+ "4": "12",
20
+ "5": "13",
21
+ "6": "14",
22
+ "7": "15",
23
+ "8": "16",
24
+ "9": "17",
25
+ "10": "18",
26
+ "11": "19",
27
+ "12": "2",
28
+ "13": "20",
29
+ "14": "21",
30
+ "15": "22",
31
+ "16": "23",
32
+ "17": "24",
33
+ "18": "25",
34
+ "19": "26",
35
+ "20": "27",
36
+ "21": "3",
37
+ "22": "4",
38
+ "23": "5",
39
+ "24": "6",
40
+ "25": "7",
41
+ "26": "8",
42
+ "27": "9"
43
+ },
44
+ "initializer_range": 0.02,
45
+ "intermediate_size": 4096,
46
+ "label2id": {
47
+ "0": 0,
48
+ "1": 1,
49
+ "10": 2,
50
+ "11": 3,
51
+ "12": 4,
52
+ "13": 5,
53
+ "14": 6,
54
+ "15": 7,
55
+ "16": 8,
56
+ "17": 9,
57
+ "18": 10,
58
+ "19": 11,
59
+ "2": 12,
60
+ "20": 13,
61
+ "21": 14,
62
+ "22": 15,
63
+ "23": 16,
64
+ "24": 17,
65
+ "25": 18,
66
+ "26": 19,
67
+ "27": 20,
68
+ "3": 21,
69
+ "4": 22,
70
+ "5": 23,
71
+ "6": 24,
72
+ "7": 25,
73
+ "8": 26,
74
+ "9": 27
75
+ },
76
+ "layer_norm_eps": 1e-05,
77
+ "max_position_embeddings": 514,
78
+ "model_type": "roberta",
79
+ "num_attention_heads": 16,
80
+ "num_hidden_layers": 24,
81
+ "pad_token_id": 1,
82
+ "position_embedding_type": "absolute",
83
+ "problem_type": "multi_label_classification",
84
+ "torch_dtype": "float32",
85
+ "transformers_version": "4.45.0.dev0",
86
+ "type_vocab_size": 1,
87
+ "use_cache": true,
88
+ "vocab_size": 50265
89
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:635c484e65e729288eb407ac8dd17bddf4fa5cdb7d567955a6ef56ab1628d8bf
3
+ size 1421602016
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "max_length": 128,
52
+ "model_max_length": 512,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "<pad>",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "</s>",
58
+ "stride": 0,
59
+ "tokenizer_class": "RobertaTokenizer",
60
+ "trim_offsets": true,
61
+ "truncation_side": "left",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "<unk>"
64
+ }
trainer_state.json ADDED
@@ -0,0 +1,426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 8.0,
5
+ "eval_steps": 500,
6
+ "global_step": 12032,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.3324468085106383,
13
+ "grad_norm": 0.3100210130214691,
14
+ "learning_rate": 1.9168882978723405e-05,
15
+ "loss": 0.0604,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.3324468085106383,
20
+ "eval_f1": 0.7222991689750693,
21
+ "eval_loss": 0.051721904426813126,
22
+ "eval_runtime": 17.61,
23
+ "eval_samples_per_second": 85.349,
24
+ "eval_steps_per_second": 10.676,
25
+ "step": 500
26
+ },
27
+ {
28
+ "epoch": 0.6648936170212766,
29
+ "grad_norm": 1.4195618629455566,
30
+ "learning_rate": 1.833776595744681e-05,
31
+ "loss": 0.0531,
32
+ "step": 1000
33
+ },
34
+ {
35
+ "epoch": 0.6648936170212766,
36
+ "eval_f1": 0.7194194885970975,
37
+ "eval_loss": 0.05466139316558838,
38
+ "eval_runtime": 17.5929,
39
+ "eval_samples_per_second": 85.432,
40
+ "eval_steps_per_second": 10.686,
41
+ "step": 1000
42
+ },
43
+ {
44
+ "epoch": 0.9973404255319149,
45
+ "grad_norm": 1.0888991355895996,
46
+ "learning_rate": 1.7506648936170213e-05,
47
+ "loss": 0.0519,
48
+ "step": 1500
49
+ },
50
+ {
51
+ "epoch": 0.9973404255319149,
52
+ "eval_f1": 0.737983375496928,
53
+ "eval_loss": 0.04508286714553833,
54
+ "eval_runtime": 17.6072,
55
+ "eval_samples_per_second": 85.363,
56
+ "eval_steps_per_second": 10.677,
57
+ "step": 1500
58
+ },
59
+ {
60
+ "epoch": 1.3297872340425532,
61
+ "grad_norm": 0.06017958000302315,
62
+ "learning_rate": 1.667553191489362e-05,
63
+ "loss": 0.0319,
64
+ "step": 2000
65
+ },
66
+ {
67
+ "epoch": 1.3297872340425532,
68
+ "eval_f1": 0.745473180731124,
69
+ "eval_loss": 0.05267561972141266,
70
+ "eval_runtime": 17.5974,
71
+ "eval_samples_per_second": 85.41,
72
+ "eval_steps_per_second": 10.683,
73
+ "step": 2000
74
+ },
75
+ {
76
+ "epoch": 1.6622340425531914,
77
+ "grad_norm": 5.658984184265137,
78
+ "learning_rate": 1.584441489361702e-05,
79
+ "loss": 0.0361,
80
+ "step": 2500
81
+ },
82
+ {
83
+ "epoch": 1.6622340425531914,
84
+ "eval_f1": 0.7482853223593965,
85
+ "eval_loss": 0.05346440523862839,
86
+ "eval_runtime": 17.5948,
87
+ "eval_samples_per_second": 85.423,
88
+ "eval_steps_per_second": 10.685,
89
+ "step": 2500
90
+ },
91
+ {
92
+ "epoch": 1.9946808510638299,
93
+ "grad_norm": 2.0736632347106934,
94
+ "learning_rate": 1.5013297872340426e-05,
95
+ "loss": 0.0322,
96
+ "step": 3000
97
+ },
98
+ {
99
+ "epoch": 1.9946808510638299,
100
+ "eval_f1": 0.7512027491408935,
101
+ "eval_loss": 0.05298233404755592,
102
+ "eval_runtime": 17.6025,
103
+ "eval_samples_per_second": 85.385,
104
+ "eval_steps_per_second": 10.68,
105
+ "step": 3000
106
+ },
107
+ {
108
+ "epoch": 2.327127659574468,
109
+ "grad_norm": 0.005054426845163107,
110
+ "learning_rate": 1.4182180851063831e-05,
111
+ "loss": 0.0174,
112
+ "step": 3500
113
+ },
114
+ {
115
+ "epoch": 2.327127659574468,
116
+ "eval_f1": 0.7525218560860794,
117
+ "eval_loss": 0.07012591511011124,
118
+ "eval_runtime": 17.5942,
119
+ "eval_samples_per_second": 85.426,
120
+ "eval_steps_per_second": 10.685,
121
+ "step": 3500
122
+ },
123
+ {
124
+ "epoch": 2.6595744680851063,
125
+ "grad_norm": 0.475389301776886,
126
+ "learning_rate": 1.3351063829787235e-05,
127
+ "loss": 0.0218,
128
+ "step": 4000
129
+ },
130
+ {
131
+ "epoch": 2.6595744680851063,
132
+ "eval_f1": 0.7464694014794889,
133
+ "eval_loss": 0.0661526769399643,
134
+ "eval_runtime": 17.6058,
135
+ "eval_samples_per_second": 85.37,
136
+ "eval_steps_per_second": 10.678,
137
+ "step": 4000
138
+ },
139
+ {
140
+ "epoch": 2.992021276595745,
141
+ "grad_norm": 0.06960093975067139,
142
+ "learning_rate": 1.2519946808510639e-05,
143
+ "loss": 0.0215,
144
+ "step": 4500
145
+ },
146
+ {
147
+ "epoch": 2.992021276595745,
148
+ "eval_f1": 0.7512864493996569,
149
+ "eval_loss": 0.06172482669353485,
150
+ "eval_runtime": 17.6025,
151
+ "eval_samples_per_second": 85.385,
152
+ "eval_steps_per_second": 10.68,
153
+ "step": 4500
154
+ },
155
+ {
156
+ "epoch": 3.324468085106383,
157
+ "grad_norm": 0.00278457417152822,
158
+ "learning_rate": 1.1688829787234044e-05,
159
+ "loss": 0.0111,
160
+ "step": 5000
161
+ },
162
+ {
163
+ "epoch": 3.324468085106383,
164
+ "eval_f1": 0.7572621035058431,
165
+ "eval_loss": 0.08673229813575745,
166
+ "eval_runtime": 17.6012,
167
+ "eval_samples_per_second": 85.392,
168
+ "eval_steps_per_second": 10.681,
169
+ "step": 5000
170
+ },
171
+ {
172
+ "epoch": 3.6569148936170213,
173
+ "grad_norm": 0.18681606650352478,
174
+ "learning_rate": 1.0857712765957446e-05,
175
+ "loss": 0.0118,
176
+ "step": 5500
177
+ },
178
+ {
179
+ "epoch": 3.6569148936170213,
180
+ "eval_f1": 0.7567567567567568,
181
+ "eval_loss": 0.08375687897205353,
182
+ "eval_runtime": 17.6087,
183
+ "eval_samples_per_second": 85.356,
184
+ "eval_steps_per_second": 10.677,
185
+ "step": 5500
186
+ },
187
+ {
188
+ "epoch": 3.9893617021276597,
189
+ "grad_norm": 0.149847149848938,
190
+ "learning_rate": 1.0026595744680852e-05,
191
+ "loss": 0.0137,
192
+ "step": 6000
193
+ },
194
+ {
195
+ "epoch": 3.9893617021276597,
196
+ "eval_f1": 0.7489075630252101,
197
+ "eval_loss": 0.07563214004039764,
198
+ "eval_runtime": 17.6009,
199
+ "eval_samples_per_second": 85.393,
200
+ "eval_steps_per_second": 10.681,
201
+ "step": 6000
202
+ },
203
+ {
204
+ "epoch": 4.321808510638298,
205
+ "grad_norm": 0.003993071615695953,
206
+ "learning_rate": 9.195478723404257e-06,
207
+ "loss": 0.0067,
208
+ "step": 6500
209
+ },
210
+ {
211
+ "epoch": 4.321808510638298,
212
+ "eval_f1": 0.747245409015025,
213
+ "eval_loss": 0.09123753011226654,
214
+ "eval_runtime": 17.6029,
215
+ "eval_samples_per_second": 85.384,
216
+ "eval_steps_per_second": 10.68,
217
+ "step": 6500
218
+ },
219
+ {
220
+ "epoch": 4.654255319148936,
221
+ "grad_norm": 0.002212055493146181,
222
+ "learning_rate": 8.36436170212766e-06,
223
+ "loss": 0.0084,
224
+ "step": 7000
225
+ },
226
+ {
227
+ "epoch": 4.654255319148936,
228
+ "eval_f1": 0.7503337783711616,
229
+ "eval_loss": 0.08904670178890228,
230
+ "eval_runtime": 17.6094,
231
+ "eval_samples_per_second": 85.352,
232
+ "eval_steps_per_second": 10.676,
233
+ "step": 7000
234
+ },
235
+ {
236
+ "epoch": 4.986702127659575,
237
+ "grad_norm": 0.010206693783402443,
238
+ "learning_rate": 7.5332446808510636e-06,
239
+ "loss": 0.0066,
240
+ "step": 7500
241
+ },
242
+ {
243
+ "epoch": 4.986702127659575,
244
+ "eval_f1": 0.7481629926519706,
245
+ "eval_loss": 0.09713348001241684,
246
+ "eval_runtime": 17.6091,
247
+ "eval_samples_per_second": 85.354,
248
+ "eval_steps_per_second": 10.676,
249
+ "step": 7500
250
+ },
251
+ {
252
+ "epoch": 5.319148936170213,
253
+ "grad_norm": 0.0045317914336919785,
254
+ "learning_rate": 6.702127659574469e-06,
255
+ "loss": 0.005,
256
+ "step": 8000
257
+ },
258
+ {
259
+ "epoch": 5.319148936170213,
260
+ "eval_f1": 0.7595865288429476,
261
+ "eval_loss": 0.0952615961432457,
262
+ "eval_runtime": 17.6003,
263
+ "eval_samples_per_second": 85.396,
264
+ "eval_steps_per_second": 10.682,
265
+ "step": 8000
266
+ },
267
+ {
268
+ "epoch": 5.651595744680851,
269
+ "grad_norm": 0.0027383090928196907,
270
+ "learning_rate": 5.871010638297873e-06,
271
+ "loss": 0.0032,
272
+ "step": 8500
273
+ },
274
+ {
275
+ "epoch": 5.651595744680851,
276
+ "eval_f1": 0.752435337588176,
277
+ "eval_loss": 0.10410240292549133,
278
+ "eval_runtime": 17.6009,
279
+ "eval_samples_per_second": 85.393,
280
+ "eval_steps_per_second": 10.681,
281
+ "step": 8500
282
+ },
283
+ {
284
+ "epoch": 5.98404255319149,
285
+ "grad_norm": 13.284900665283203,
286
+ "learning_rate": 5.039893617021277e-06,
287
+ "loss": 0.0034,
288
+ "step": 9000
289
+ },
290
+ {
291
+ "epoch": 5.98404255319149,
292
+ "eval_f1": 0.748834110592938,
293
+ "eval_loss": 0.10824441909790039,
294
+ "eval_runtime": 17.6038,
295
+ "eval_samples_per_second": 85.379,
296
+ "eval_steps_per_second": 10.68,
297
+ "step": 9000
298
+ },
299
+ {
300
+ "epoch": 6.316489361702128,
301
+ "grad_norm": 0.04516634717583656,
302
+ "learning_rate": 4.208776595744681e-06,
303
+ "loss": 0.003,
304
+ "step": 9500
305
+ },
306
+ {
307
+ "epoch": 6.316489361702128,
308
+ "eval_f1": 0.7523489932885906,
309
+ "eval_loss": 0.1027175560593605,
310
+ "eval_runtime": 17.6036,
311
+ "eval_samples_per_second": 85.38,
312
+ "eval_steps_per_second": 10.68,
313
+ "step": 9500
314
+ },
315
+ {
316
+ "epoch": 6.648936170212766,
317
+ "grad_norm": 0.0026772848796099424,
318
+ "learning_rate": 3.377659574468085e-06,
319
+ "loss": 0.0024,
320
+ "step": 10000
321
+ },
322
+ {
323
+ "epoch": 6.648936170212766,
324
+ "eval_f1": 0.7510829723425525,
325
+ "eval_loss": 0.10764423757791519,
326
+ "eval_runtime": 17.6186,
327
+ "eval_samples_per_second": 85.308,
328
+ "eval_steps_per_second": 10.671,
329
+ "step": 10000
330
+ },
331
+ {
332
+ "epoch": 6.9813829787234045,
333
+ "grad_norm": 0.0036624702624976635,
334
+ "learning_rate": 2.5465425531914894e-06,
335
+ "loss": 0.0018,
336
+ "step": 10500
337
+ },
338
+ {
339
+ "epoch": 6.9813829787234045,
340
+ "eval_f1": 0.7535845281760587,
341
+ "eval_loss": 0.10941769182682037,
342
+ "eval_runtime": 17.5982,
343
+ "eval_samples_per_second": 85.406,
344
+ "eval_steps_per_second": 10.683,
345
+ "step": 10500
346
+ },
347
+ {
348
+ "epoch": 7.3138297872340425,
349
+ "grad_norm": 0.0005808394053019583,
350
+ "learning_rate": 1.7154255319148937e-06,
351
+ "loss": 0.0016,
352
+ "step": 11000
353
+ },
354
+ {
355
+ "epoch": 7.3138297872340425,
356
+ "eval_f1": 0.7588510354041417,
357
+ "eval_loss": 0.11245912313461304,
358
+ "eval_runtime": 17.6057,
359
+ "eval_samples_per_second": 85.37,
360
+ "eval_steps_per_second": 10.678,
361
+ "step": 11000
362
+ },
363
+ {
364
+ "epoch": 7.6462765957446805,
365
+ "grad_norm": 0.0013273729709908366,
366
+ "learning_rate": 8.84308510638298e-07,
367
+ "loss": 0.0009,
368
+ "step": 11500
369
+ },
370
+ {
371
+ "epoch": 7.6462765957446805,
372
+ "eval_f1": 0.7561057209769153,
373
+ "eval_loss": 0.11294491589069366,
374
+ "eval_runtime": 17.6034,
375
+ "eval_samples_per_second": 85.381,
376
+ "eval_steps_per_second": 10.68,
377
+ "step": 11500
378
+ },
379
+ {
380
+ "epoch": 7.9787234042553195,
381
+ "grad_norm": 0.002082614693790674,
382
+ "learning_rate": 5.319148936170213e-08,
383
+ "loss": 0.0007,
384
+ "step": 12000
385
+ },
386
+ {
387
+ "epoch": 7.9787234042553195,
388
+ "eval_f1": 0.7553475935828877,
389
+ "eval_loss": 0.11238062381744385,
390
+ "eval_runtime": 17.6206,
391
+ "eval_samples_per_second": 85.298,
392
+ "eval_steps_per_second": 10.669,
393
+ "step": 12000
394
+ },
395
+ {
396
+ "epoch": 8.0,
397
+ "step": 12032,
398
+ "total_flos": 4.483356451828531e+16,
399
+ "train_loss": 0.01690695836027481,
400
+ "train_runtime": 2479.1734,
401
+ "train_samples_per_second": 19.403,
402
+ "train_steps_per_second": 4.853
403
+ }
404
+ ],
405
+ "logging_steps": 500,
406
+ "max_steps": 12032,
407
+ "num_input_tokens_seen": 0,
408
+ "num_train_epochs": 8,
409
+ "save_steps": 500,
410
+ "stateful_callbacks": {
411
+ "TrainerControl": {
412
+ "args": {
413
+ "should_epoch_stop": false,
414
+ "should_evaluate": false,
415
+ "should_log": false,
416
+ "should_save": true,
417
+ "should_training_stop": true
418
+ },
419
+ "attributes": {}
420
+ }
421
+ },
422
+ "total_flos": 4.483356451828531e+16,
423
+ "train_batch_size": 4,
424
+ "trial_name": null,
425
+ "trial_params": null
426
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed0e0c0e86cad25a1fa7038639910f9b8ae3db8f7b86b1dd0745425af810f6ad
3
+ size 5240
vocab.json ADDED
The diff for this file is too large to render. See raw diff