impossible-llms-french-fronting-n

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6305

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
79.3709 0.9412 12 9.7045
72.7126 1.9412 24 8.9774
68.6345 2.9412 36 8.4702
65.2741 3.9412 48 8.0744
62.3282 4.9412 60 7.7026
59.0381 5.9412 72 7.3473
56.5548 6.9412 84 6.9984
53.6941 7.9412 96 6.6513
50.9433 8.9412 108 6.3196
48.5468 9.9412 120 6.0200
46.7525 10.9412 132 5.7827
45.0493 11.9412 144 5.6299
44.1773 12.9412 156 5.5251
43.5938 13.9412 168 5.4614
43.2505 14.9412 180 5.3976
42.5248 15.9412 192 5.3463
42.7323 16.9412 204 5.3061
41.8438 17.9412 216 5.2640
41.8017 18.9412 228 5.2331
41.5583 19.9412 240 5.2129
41.1981 20.9412 252 5.1867
41.1009 21.9412 264 5.1471
40.7337 22.9412 276 5.1205
40.3246 23.9412 288 5.0902
40.2823 24.9412 300 5.0568
39.6541 25.9412 312 5.0338
39.8079 26.9412 324 5.0062
39.4075 27.9412 336 4.9785
38.7983 28.9412 348 4.9546
39.0204 29.9412 360 4.9298
38.9579 30.9412 372 4.9039
38.399 31.9412 384 4.8832
38.4498 32.9412 396 4.8599
38.0357 33.9412 408 4.8382
38.0998 34.9412 420 4.8142
37.7551 35.9412 432 4.7946
37.2101 36.9412 444 4.7754
37.2717 37.9412 456 4.7533
37.1984 38.9412 468 4.7316
36.7187 39.9412 480 4.7158
36.7248 40.9412 492 4.6993
36.2434 41.9412 504 4.6772
36.2787 42.9412 516 4.6629
36.1931 43.9412 528 4.6502
36.059 44.9412 540 4.6344
35.9529 45.9412 552 4.6207
35.6028 46.9412 564 4.6060
35.4822 47.9412 576 4.6018
35.2229 48.9412 588 4.5832
35.0409 49.9412 600 4.5758
34.9871 50.9412 612 4.5642
34.6724 51.9412 624 4.5540
34.6283 52.9412 636 4.5508
34.1214 53.9412 648 4.5419
34.6346 54.9412 660 4.5355
34.1235 55.9412 672 4.5285
33.8894 56.9412 684 4.5269
34.0441 57.9412 696 4.5185
33.87 58.9412 708 4.5139
33.638 59.9412 720 4.5077
33.3845 60.9412 732 4.5063
33.3044 61.9412 744 4.5124
33.153 62.9412 756 4.5055
32.9362 63.9412 768 4.5022
32.8092 64.9412 780 4.5050
32.6733 65.9412 792 4.5019
32.7524 66.9412 804 4.5057
32.5161 67.9412 816 4.5005
32.2748 68.9412 828 4.5093
32.0324 69.9412 840 4.5078
31.9758 70.9412 852 4.5174
31.6784 71.9412 864 4.5163
31.5865 72.9412 876 4.5188
31.6343 73.9412 888 4.5243
31.1777 74.9412 900 4.5303
30.9053 75.9412 912 4.5351
31.1924 76.9412 924 4.5402
31.1002 77.9412 936 4.5498
30.8247 78.9412 948 4.5503
30.7879 79.9412 960 4.5589
30.5964 80.9412 972 4.5675
30.4823 81.9412 984 4.5696
30.3273 82.9412 996 4.5739
30.1613 83.9412 1008 4.5899
29.7599 84.9412 1020 4.5945
29.7854 85.9412 1032 4.6096
29.6273 86.9412 1044 4.6106
29.5197 87.9412 1056 4.6212
29.712 88.9412 1068 4.6339
29.4239 89.9412 1080 4.6416
29.151 90.9412 1092 4.6544
28.841 91.9412 1104 4.6626
28.9508 92.9412 1116 4.6667
28.9138 93.9412 1128 4.6784
28.8629 94.9412 1140 4.6942
28.4463 95.9412 1152 4.6963
28.3226 96.9412 1164 4.7093
27.9057 97.9412 1176 4.7185
28.0171 98.9412 1188 4.7351
28.0574 99.9412 1200 4.7429
27.9451 100.9412 1212 4.7547
27.6435 101.9412 1224 4.7711
27.5748 102.9412 1236 4.7745
27.5951 103.9412 1248 4.7869
27.2009 104.9412 1260 4.8014
27.1234 105.9412 1272 4.8097
27.0729 106.9412 1284 4.8280
27.0189 107.9412 1296 4.8291
26.9423 108.9412 1308 4.8335
26.6313 109.9412 1320 4.8591
26.666 110.9412 1332 4.8699
26.5822 111.9412 1344 4.8775
26.3443 112.9412 1356 4.8868
26.374 113.9412 1368 4.8992
26.2428 114.9412 1380 4.9065
25.7932 115.9412 1392 4.9194
25.9217 116.9412 1404 4.9292
25.7235 117.9412 1416 4.9478
25.6739 118.9412 1428 4.9529
25.6198 119.9412 1440 4.9629
25.4504 120.9412 1452 4.9822
25.3776 121.9412 1464 4.9881
25.2885 122.9412 1476 5.0047
25.2783 123.9412 1488 5.0104
25.0042 124.9412 1500 5.0185
24.9964 125.9412 1512 5.0327
24.8719 126.9412 1524 5.0395
24.6866 127.9412 1536 5.0503
24.5244 128.9412 1548 5.0640
24.6366 129.9412 1560 5.0765
24.478 130.9412 1572 5.0820
24.3425 131.9412 1584 5.0963
24.3417 132.9412 1596 5.1094
24.2658 133.9412 1608 5.1135
24.0955 134.9412 1620 5.1284
24.0612 135.9412 1632 5.1331
23.9143 136.9412 1644 5.1502
23.8829 137.9412 1656 5.1525
23.9453 138.9412 1668 5.1616
23.5645 139.9412 1680 5.1758
23.6355 140.9412 1692 5.1797
23.5018 141.9412 1704 5.1912
23.2835 142.9412 1716 5.2100
23.33 143.9412 1728 5.2179
23.2781 144.9412 1740 5.2249
23.2144 145.9412 1752 5.2319
23.0582 146.9412 1764 5.2466
22.9711 147.9412 1776 5.2532
22.9845 148.9412 1788 5.2559
22.7305 149.9412 1800 5.2628
22.7762 150.9412 1812 5.2769
22.6747 151.9412 1824 5.2869
22.5236 152.9412 1836 5.2955
22.6058 153.9412 1848 5.2953
22.5488 154.9412 1860 5.3057
22.4842 155.9412 1872 5.3166
22.3547 156.9412 1884 5.3223
22.2193 157.9412 1896 5.3339
22.4118 158.9412 1908 5.3421
22.1839 159.9412 1920 5.3475
21.9919 160.9412 1932 5.3563
22.1601 161.9412 1944 5.3630
21.9951 162.9412 1956 5.3704
21.8319 163.9412 1968 5.3797
21.8867 164.9412 1980 5.3856
21.7331 165.9412 1992 5.3916
21.7016 166.9412 2004 5.4028
21.7451 167.9412 2016 5.4036
21.6868 168.9412 2028 5.4117
21.5965 169.9412 2040 5.4183
21.6384 170.9412 2052 5.4259
21.3644 171.9412 2064 5.4331
21.4688 172.9412 2076 5.4402
21.4039 173.9412 2088 5.4375
21.2316 174.9412 2100 5.4499
21.3216 175.9412 2112 5.4528
21.289 176.9412 2124 5.4597
21.2134 177.9412 2136 5.4633
21.0559 178.9412 2148 5.4730
21.1651 179.9412 2160 5.4767
21.1043 180.9412 2172 5.4801
21.1069 181.9412 2184 5.4881
20.9558 182.9412 2196 5.4921
20.9238 183.9412 2208 5.4981
20.9508 184.9412 2220 5.5056
20.7217 185.9412 2232 5.5096
20.9891 186.9412 2244 5.5113
20.9357 187.9412 2256 5.5182
20.8073 188.9412 2268 5.5194
20.7575 189.9412 2280 5.5230
20.767 190.9412 2292 5.5304
20.7004 191.9412 2304 5.5370
20.7248 192.9412 2316 5.5379
20.5998 193.9412 2328 5.5433
20.5258 194.9412 2340 5.5488
20.5745 195.9412 2352 5.5488
20.4786 196.9412 2364 5.5560
20.3979 197.9412 2376 5.5562
20.4767 198.9412 2388 5.5633
20.4732 199.9412 2400 5.5669
20.4404 200.9412 2412 5.5696
20.3467 201.9412 2424 5.5722
20.3103 202.9412 2436 5.5749
20.3808 203.9412 2448 5.5793
20.3694 204.9412 2460 5.5817
20.3105 205.9412 2472 5.5853
20.2944 206.9412 2484 5.5844
20.243 207.9412 2496 5.5905
20.2056 208.9412 2508 5.5907
20.2136 209.9412 2520 5.5933
20.147 210.9412 2532 5.5960
19.9896 211.9412 2544 5.6000
20.215 212.9412 2556 5.5980
20.1486 213.9412 2568 5.6041
20.0757 214.9412 2580 5.6038
20.0677 215.9412 2592 5.6054
20.0843 216.9412 2604 5.6096
20.2252 217.9412 2616 5.6091
19.9871 218.9412 2628 5.6114
20.0075 219.9412 2640 5.6145
20.0858 220.9412 2652 5.6165
20.0257 221.9412 2664 5.6185
19.9539 222.9412 2676 5.6190
20.0972 223.9412 2688 5.6204
19.9998 224.9412 2700 5.6197
20.0507 225.9412 2712 5.6218
19.982 226.9412 2724 5.6219
19.9895 227.9412 2736 5.6241
19.9444 228.9412 2748 5.6238
19.8426 229.9412 2760 5.6253
19.9946 230.9412 2772 5.6252
19.8973 231.9412 2784 5.6255
19.9125 232.9412 2796 5.6282
20.0479 233.9412 2808 5.6275
19.9772 234.9412 2820 5.6285
19.9675 235.9412 2832 5.6281
19.9506 236.9412 2844 5.6290
19.8771 237.9412 2856 5.6300
19.8583 238.9412 2868 5.6290
19.9374 239.9412 2880 5.6298
19.9017 240.9412 2892 5.6305
19.7786 241.9412 2904 5.6305
19.8189 242.9412 2916 5.6305
19.8966 243.9412 2928 5.6305
19.8309 244.9412 2940 5.6304
19.906 245.9412 2952 5.6305
19.8537 246.9412 2964 5.6305
19.8064 247.9412 2976 5.6305
19.9354 248.9412 2988 5.6305
19.8992 249.9412 3000 5.6305

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-french-fronting-n