File size: 16,565 Bytes
8f11504
 
 
 
 
 
4f64a67
8f11504
 
 
 
 
 
 
 
 
 
 
4f64a67
8f11504
4f64a67
8f11504
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
library_name: peft
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
- llama-factory
- lntuning
- generated_from_trainer
model-index:
- name: train_stsb_1745333597
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# train_stsb_1745333597

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on the stsb dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2672
- Num Input Tokens Seen: 61177152

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 123
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000

### Training results

| Training Loss | Epoch    | Step  | Validation Loss | Input Tokens Seen |
|:-------------:|:--------:|:-----:|:---------------:|:-----------------:|
| 1.585         | 0.6182   | 200   | 1.6706          | 304960            |
| 0.8688        | 1.2349   | 400   | 0.9288          | 610112            |
| 0.7313        | 1.8532   | 600   | 0.7213          | 918240            |
| 0.5594        | 2.4699   | 800   | 0.6223          | 1223440           |
| 0.5123        | 3.0866   | 1000  | 0.5666          | 1529568           |
| 0.6423        | 3.7048   | 1200  | 0.5304          | 1838464           |
| 0.4786        | 4.3215   | 1400  | 0.5007          | 2144560           |
| 0.4309        | 4.9397   | 1600  | 0.4772          | 2450736           |
| 0.3631        | 5.5564   | 1800  | 0.4542          | 2755856           |
| 0.3999        | 6.1731   | 2000  | 0.4361          | 3063440           |
| 0.3943        | 6.7913   | 2200  | 0.4218          | 3368976           |
| 0.3871        | 7.4080   | 2400  | 0.4086          | 3677040           |
| 0.301         | 8.0247   | 2600  | 0.3979          | 3983872           |
| 0.402         | 8.6430   | 2800  | 0.3875          | 4292480           |
| 0.3267        | 9.2597   | 3000  | 0.3802          | 4594560           |
| 0.304         | 9.8779   | 3200  | 0.3731          | 4900544           |
| 0.2844        | 10.4946  | 3400  | 0.3671          | 5206928           |
| 0.2721        | 11.1113  | 3600  | 0.3614          | 5511472           |
| 0.3163        | 11.7295  | 3800  | 0.3571          | 5815280           |
| 0.2565        | 12.3462  | 4000  | 0.3520          | 6122240           |
| 0.3282        | 12.9645  | 4200  | 0.3491          | 6427616           |
| 0.2975        | 13.5811  | 4400  | 0.3457          | 6733776           |
| 0.2807        | 14.1978  | 4600  | 0.3424          | 7038848           |
| 0.3327        | 14.8161  | 4800  | 0.3407          | 7344384           |
| 0.2291        | 15.4328  | 5000  | 0.3364          | 7651280           |
| 0.2319        | 16.0495  | 5200  | 0.3341          | 7955504           |
| 0.2366        | 16.6677  | 5400  | 0.3320          | 8262864           |
| 0.2756        | 17.2844  | 5600  | 0.3281          | 8568256           |
| 0.2911        | 17.9026  | 5800  | 0.3263          | 8873856           |
| 0.2603        | 18.5193  | 6000  | 0.3249          | 9180288           |
| 0.2363        | 19.1360  | 6200  | 0.3226          | 9486288           |
| 0.2801        | 19.7543  | 6400  | 0.3202          | 9792720           |
| 0.2232        | 20.3709  | 6600  | 0.3184          | 10100576          |
| 0.2155        | 20.9892  | 6800  | 0.3165          | 10406848          |
| 0.2464        | 21.6059  | 7000  | 0.3152          | 10713296          |
| 0.2812        | 22.2226  | 7200  | 0.3126          | 11016800          |
| 0.2069        | 22.8408  | 7400  | 0.3121          | 11325536          |
| 0.2316        | 23.4575  | 7600  | 0.3092          | 11631392          |
| 0.2594        | 24.0742  | 7800  | 0.3088          | 11936144          |
| 0.2298        | 24.6924  | 8000  | 0.3072          | 12244560          |
| 0.2299        | 25.3091  | 8200  | 0.3063          | 12549728          |
| 0.2479        | 25.9274  | 8400  | 0.3049          | 12858400          |
| 0.2615        | 26.5440  | 8600  | 0.3026          | 13163216          |
| 0.2499        | 27.1607  | 8800  | 0.3028          | 13469440          |
| 0.2869        | 27.7790  | 9000  | 0.3005          | 13774400          |
| 0.2175        | 28.3957  | 9200  | 0.2990          | 14082512          |
| 0.2517        | 29.0124  | 9400  | 0.2969          | 14385408          |
| 0.2257        | 29.6306  | 9600  | 0.2969          | 14692096          |
| 0.2787        | 30.2473  | 9800  | 0.2941          | 14996480          |
| 0.244         | 30.8655  | 10000 | 0.2930          | 15302624          |
| 0.2476        | 31.4822  | 10200 | 0.2940          | 15609936          |
| 0.2283        | 32.0989  | 10400 | 0.2914          | 15915040          |
| 0.2542        | 32.7172  | 10600 | 0.2917          | 16222112          |
| 0.1897        | 33.3338  | 10800 | 0.2906          | 16525360          |
| 0.2081        | 33.9521  | 11000 | 0.2892          | 16833040          |
| 0.1956        | 34.5688  | 11200 | 0.2886          | 17138928          |
| 0.2216        | 35.1855  | 11400 | 0.2881          | 17446224          |
| 0.2565        | 35.8037  | 11600 | 0.2875          | 17754192          |
| 0.2367        | 36.4204  | 11800 | 0.2870          | 18056816          |
| 0.218         | 37.0371  | 12000 | 0.2860          | 18365904          |
| 0.2416        | 37.6553  | 12200 | 0.2870          | 18669424          |
| 0.2194        | 38.2720  | 12400 | 0.2838          | 18975680          |
| 0.2615        | 38.8903  | 12600 | 0.2848          | 19284128          |
| 0.2299        | 39.5070  | 12800 | 0.2833          | 19589440          |
| 0.2019        | 40.1236  | 13000 | 0.2837          | 19892304          |
| 0.2313        | 40.7419  | 13200 | 0.2832          | 20201904          |
| 0.2054        | 41.3586  | 13400 | 0.2817          | 20507296          |
| 0.2196        | 41.9768  | 13600 | 0.2827          | 20814240          |
| 0.2243        | 42.5935  | 13800 | 0.2816          | 21117472          |
| 0.2293        | 43.2102  | 14000 | 0.2812          | 21424352          |
| 0.2336        | 43.8284  | 14200 | 0.2808          | 21729344          |
| 0.2159        | 44.4451  | 14400 | 0.2797          | 22035168          |
| 0.2103        | 45.0618  | 14600 | 0.2822          | 22341904          |
| 0.1983        | 45.6801  | 14800 | 0.2792          | 22646640          |
| 0.1967        | 46.2968  | 15000 | 0.2792          | 22952944          |
| 0.2025        | 46.9150  | 15200 | 0.2787          | 23260240          |
| 0.2189        | 47.5317  | 15400 | 0.2775          | 23566048          |
| 0.1989        | 48.1484  | 15600 | 0.2789          | 23871504          |
| 0.2509        | 48.7666  | 15800 | 0.2784          | 24175696          |
| 0.2322        | 49.3833  | 16000 | 0.2776          | 24480832          |
| 0.1908        | 50.0     | 16200 | 0.2772          | 24786896          |
| 0.2339        | 50.6182  | 16400 | 0.2766          | 25092208          |
| 0.2459        | 51.2349  | 16600 | 0.2752          | 25398288          |
| 0.2095        | 51.8532  | 16800 | 0.2773          | 25707024          |
| 0.2175        | 52.4699  | 17000 | 0.2757          | 26010848          |
| 0.2199        | 53.0866  | 17200 | 0.2758          | 26319616          |
| 0.2214        | 53.7048  | 17400 | 0.2761          | 26623232          |
| 0.1734        | 54.3215  | 17600 | 0.2755          | 26932512          |
| 0.1936        | 54.9397  | 17800 | 0.2750          | 27238304          |
| 0.2362        | 55.5564  | 18000 | 0.2747          | 27542688          |
| 0.2488        | 56.1731  | 18200 | 0.2735          | 27848608          |
| 0.2768        | 56.7913  | 18400 | 0.2747          | 28156128          |
| 0.2003        | 57.4080  | 18600 | 0.2731          | 28463824          |
| 0.2049        | 58.0247  | 18800 | 0.2735          | 28768304          |
| 0.2042        | 58.6430  | 19000 | 0.2742          | 29076400          |
| 0.1949        | 59.2597  | 19200 | 0.2723          | 29381968          |
| 0.2597        | 59.8779  | 19400 | 0.2728          | 29688144          |
| 0.1911        | 60.4946  | 19600 | 0.2727          | 29993744          |
| 0.2989        | 61.1113  | 19800 | 0.2730          | 30299024          |
| 0.2307        | 61.7295  | 20000 | 0.2713          | 30604816          |
| 0.2132        | 62.3462  | 20200 | 0.2711          | 30909520          |
| 0.2025        | 62.9645  | 20400 | 0.2708          | 31217744          |
| 0.1913        | 63.5811  | 20600 | 0.2718          | 31523296          |
| 0.2067        | 64.1978  | 20800 | 0.2716          | 31827424          |
| 0.21          | 64.8161  | 21000 | 0.2715          | 32135904          |
| 0.2766        | 65.4328  | 21200 | 0.2720          | 32439120          |
| 0.2451        | 66.0495  | 21400 | 0.2702          | 32747712          |
| 0.2197        | 66.6677  | 21600 | 0.2712          | 33052672          |
| 0.194         | 67.2844  | 21800 | 0.2714          | 33358560          |
| 0.3033        | 67.9026  | 22000 | 0.2706          | 33664736          |
| 0.2009        | 68.5193  | 22200 | 0.2703          | 33967392          |
| 0.2498        | 69.1360  | 22400 | 0.2704          | 34272592          |
| 0.1621        | 69.7543  | 22600 | 0.2701          | 34578896          |
| 0.195         | 70.3709  | 22800 | 0.2705          | 34883440          |
| 0.1973        | 70.9892  | 23000 | 0.2705          | 35188496          |
| 0.1933        | 71.6059  | 23200 | 0.2704          | 35492880          |
| 0.2729        | 72.2226  | 23400 | 0.2700          | 35798304          |
| 0.1747        | 72.8408  | 23600 | 0.2696          | 36105856          |
| 0.2           | 73.4575  | 23800 | 0.2707          | 36408816          |
| 0.2111        | 74.0742  | 24000 | 0.2704          | 36716560          |
| 0.2246        | 74.6924  | 24200 | 0.2701          | 37025168          |
| 0.2221        | 75.3091  | 24400 | 0.2697          | 37330368          |
| 0.1618        | 75.9274  | 24600 | 0.2701          | 37636736          |
| 0.2182        | 76.5440  | 24800 | 0.2708          | 37941312          |
| 0.1839        | 77.1607  | 25000 | 0.2692          | 38246144          |
| 0.2182        | 77.7790  | 25200 | 0.2696          | 38552576          |
| 0.2574        | 78.3957  | 25400 | 0.2693          | 38857104          |
| 0.2218        | 79.0124  | 25600 | 0.2694          | 39165040          |
| 0.195         | 79.6306  | 25800 | 0.2697          | 39472304          |
| 0.2723        | 80.2473  | 26000 | 0.2694          | 39777616          |
| 0.1793        | 80.8655  | 26200 | 0.2686          | 40084368          |
| 0.2347        | 81.4822  | 26400 | 0.2695          | 40388032          |
| 0.2275        | 82.0989  | 26600 | 0.2689          | 40694320          |
| 0.2472        | 82.7172  | 26800 | 0.2694          | 41001712          |
| 0.1955        | 83.3338  | 27000 | 0.2695          | 41305200          |
| 0.2043        | 83.9521  | 27200 | 0.2689          | 41615216          |
| 0.2068        | 84.5688  | 27400 | 0.2686          | 41920400          |
| 0.1841        | 85.1855  | 27600 | 0.2688          | 42224944          |
| 0.2023        | 85.8037  | 27800 | 0.2685          | 42528304          |
| 0.2246        | 86.4204  | 28000 | 0.2689          | 42836528          |
| 0.2481        | 87.0371  | 28200 | 0.2688          | 43141440          |
| 0.2264        | 87.6553  | 28400 | 0.2689          | 43445216          |
| 0.2422        | 88.2720  | 28600 | 0.2690          | 43750304          |
| 0.2099        | 88.8903  | 28800 | 0.2694          | 44055584          |
| 0.184         | 89.5070  | 29000 | 0.2691          | 44361616          |
| 0.1706        | 90.1236  | 29200 | 0.2688          | 44665936          |
| 0.1789        | 90.7419  | 29400 | 0.2687          | 44972144          |
| 0.1712        | 91.3586  | 29600 | 0.2686          | 45276416          |
| 0.2374        | 91.9768  | 29800 | 0.2673          | 45583712          |
| 0.2056        | 92.5935  | 30000 | 0.2682          | 45888688          |
| 0.2039        | 93.2102  | 30200 | 0.2687          | 46195456          |
| 0.2168        | 93.8284  | 30400 | 0.2687          | 46500288          |
| 0.1978        | 94.4451  | 30600 | 0.2686          | 46804992          |
| 0.1926        | 95.0618  | 30800 | 0.2681          | 47112576          |
| 0.1924        | 95.6801  | 31000 | 0.2681          | 47418816          |
| 0.2219        | 96.2968  | 31200 | 0.2682          | 47723232          |
| 0.1808        | 96.9150  | 31400 | 0.2676          | 48029888          |
| 0.2036        | 97.5317  | 31600 | 0.2682          | 48335504          |
| 0.1953        | 98.1484  | 31800 | 0.2672          | 48640352          |
| 0.1881        | 98.7666  | 32000 | 0.2678          | 48945632          |
| 0.2039        | 99.3833  | 32200 | 0.2684          | 49253952          |
| 0.2225        | 100.0    | 32400 | 0.2686          | 49557760          |
| 0.1741        | 100.6182 | 32600 | 0.2682          | 49863392          |
| 0.215         | 101.2349 | 32800 | 0.2683          | 50171184          |
| 0.2238        | 101.8532 | 33000 | 0.2677          | 50477424          |
| 0.1865        | 102.4699 | 33200 | 0.2690          | 50781472          |
| 0.2435        | 103.0866 | 33400 | 0.2682          | 51085008          |
| 0.2154        | 103.7048 | 33600 | 0.2681          | 51393296          |
| 0.2316        | 104.3215 | 33800 | 0.2682          | 51697808          |
| 0.2218        | 104.9397 | 34000 | 0.2679          | 52004880          |
| 0.2111        | 105.5564 | 34200 | 0.2678          | 52308944          |
| 0.1829        | 106.1731 | 34400 | 0.2675          | 52616512          |
| 0.2021        | 106.7913 | 34600 | 0.2681          | 52921600          |
| 0.1654        | 107.4080 | 34800 | 0.2681          | 53227040          |
| 0.2224        | 108.0247 | 35000 | 0.2683          | 53533488          |
| 0.1734        | 108.6430 | 35200 | 0.2684          | 53838704          |
| 0.216         | 109.2597 | 35400 | 0.2679          | 54143984          |
| 0.2126        | 109.8779 | 35600 | 0.2679          | 54449808          |
| 0.1928        | 110.4946 | 35800 | 0.2683          | 54754304          |
| 0.2473        | 111.1113 | 36000 | 0.2687          | 55060864          |
| 0.1758        | 111.7295 | 36200 | 0.2683          | 55367296          |
| 0.18          | 112.3462 | 36400 | 0.2680          | 55670672          |
| 0.2415        | 112.9645 | 36600 | 0.2685          | 55978256          |
| 0.2101        | 113.5811 | 36800 | 0.2682          | 56283024          |
| 0.1895        | 114.1978 | 37000 | 0.2678          | 56590928          |
| 0.2001        | 114.8161 | 37200 | 0.2679          | 56897936          |
| 0.1694        | 115.4328 | 37400 | 0.2681          | 57200192          |
| 0.2303        | 116.0495 | 37600 | 0.2680          | 57505872          |
| 0.1875        | 116.6677 | 37800 | 0.2680          | 57811120          |
| 0.1778        | 117.2844 | 38000 | 0.2679          | 58116320          |
| 0.2041        | 117.9026 | 38200 | 0.2685          | 58425376          |
| 0.2168        | 118.5193 | 38400 | 0.2682          | 58732208          |
| 0.2001        | 119.1360 | 38600 | 0.2681          | 59038688          |
| 0.1972        | 119.7543 | 38800 | 0.2686          | 59342656          |
| 0.1867        | 120.3709 | 39000 | 0.2676          | 59647664          |
| 0.2079        | 120.9892 | 39200 | 0.2682          | 59954128          |
| 0.2577        | 121.6059 | 39400 | 0.2679          | 60260256          |
| 0.1986        | 122.2226 | 39600 | 0.2684          | 60563120          |
| 0.1949        | 122.8408 | 39800 | 0.2681          | 60870320          |
| 0.1989        | 123.4575 | 40000 | 0.2680          | 61177152          |


### Framework versions

- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1