impossible-llms-french-mirror-reversal

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5029

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
80.0198 1.0 13 9.8432
73.8502 2.0 26 9.2010
70.1262 3.0 39 8.7313
66.9269 4.0 52 8.2876
63.0973 5.0 65 7.8440
60.2009 6.0 78 7.4241
56.4183 7.0 91 7.0040
53.2765 8.0 104 6.5948
50.3254 9.0 117 6.2393
47.7687 10.0 130 5.9410
46.419 11.0 143 5.7496
45.2453 12.0 156 5.6309
44.7207 13.0 169 5.5659
43.9129 14.0 182 5.5091
43.9687 15.0 195 5.4595
43.3349 16.0 208 5.4153
42.9744 17.0 221 5.3796
42.7889 18.0 234 5.3502
42.4452 19.0 247 5.3083
42.0081 20.0 260 5.2655
41.9865 21.0 273 5.2189
41.3489 22.0 286 5.1793
41.1289 23.0 299 5.1373
40.5837 24.0 312 5.0910
40.3428 25.0 325 5.0391
39.911 26.0 338 5.0087
39.5557 27.0 351 4.9616
39.2307 28.0 364 4.9194
38.9384 29.0 377 4.8871
38.4598 30.0 390 4.8515
38.5468 31.0 403 4.8203
37.9855 32.0 416 4.7963
37.4521 33.0 429 4.7680
37.4803 34.0 442 4.7387
37.3179 35.0 455 4.7130
37.0208 36.0 468 4.6944
36.7125 37.0 481 4.6631
36.7679 38.0 494 4.6495
36.5453 39.0 507 4.6328
35.787 40.0 520 4.6135
35.8911 41.0 533 4.5925
35.4433 42.0 546 4.5818
35.4365 43.0 559 4.5677
35.4899 44.0 572 4.5560
35.2766 45.0 585 4.5376
34.7728 46.0 598 4.5265
34.6248 47.0 611 4.5235
34.5533 48.0 624 4.5081
34.2996 49.0 637 4.5044
34.0523 50.0 650 4.4936
34.3822 51.0 663 4.4875
33.8804 52.0 676 4.4784
33.5867 53.0 689 4.4800
33.3544 54.0 702 4.4727
33.1125 55.0 715 4.4613
33.2388 56.0 728 4.4606
32.8318 57.0 741 4.4649
32.9857 58.0 754 4.4577
32.6205 59.0 767 4.4631
32.3408 60.0 780 4.4631
32.1289 61.0 793 4.4612
32.0066 62.0 806 4.4633
32.1644 63.0 819 4.4676
31.7361 64.0 832 4.4649
31.6261 65.0 845 4.4681
31.4179 66.0 858 4.4698
31.2779 67.0 871 4.4752
31.2652 68.0 884 4.4772
30.8694 69.0 897 4.4850
30.9367 70.0 910 4.4835
30.8774 71.0 923 4.4894
30.4974 72.0 936 4.5009
30.429 73.0 949 4.5056
30.0783 74.0 962 4.5110
30.0938 75.0 975 4.5112
30.0514 76.0 988 4.5217
29.9671 77.0 1001 4.5252
29.5798 78.0 1014 4.5342
29.4904 79.0 1027 4.5394
29.4252 80.0 1040 4.5593
29.1496 81.0 1053 4.5643
29.1626 82.0 1066 4.5717
28.9837 83.0 1079 4.5810
28.7902 84.0 1092 4.5841
28.7124 85.0 1105 4.6022
28.7215 86.0 1118 4.6020
28.3804 87.0 1131 4.6167
28.2634 88.0 1144 4.6284
28.1559 89.0 1157 4.6401
28.08 90.0 1170 4.6469
27.7806 91.0 1183 4.6496
27.6702 92.0 1196 4.6666
27.6931 93.0 1209 4.6786
27.5055 94.0 1222 4.6891
27.249 95.0 1235 4.7028
27.209 96.0 1248 4.7150
26.9527 97.0 1261 4.7217
27.0324 98.0 1274 4.7300
26.8408 99.0 1287 4.7466
26.6459 100.0 1300 4.7580
26.5751 101.0 1313 4.7717
26.441 102.0 1326 4.7723
26.2731 103.0 1339 4.7830
26.1837 104.0 1352 4.8001
26.1576 105.0 1365 4.8102
25.9416 106.0 1378 4.8197
25.7272 107.0 1391 4.8328
25.5827 108.0 1404 4.8460
25.5328 109.0 1417 4.8572
25.4334 110.0 1430 4.8625
25.3763 111.0 1443 4.8850
25.1549 112.0 1456 4.8907
25.2125 113.0 1469 4.9002
24.7817 114.0 1482 4.9170
24.8105 115.0 1495 4.9251
24.8834 116.0 1508 4.9305
24.6489 117.0 1521 4.9463
24.5391 118.0 1534 4.9639
24.4364 119.0 1547 4.9646
24.3963 120.0 1560 4.9733
24.1716 121.0 1573 4.9948
24.1119 122.0 1586 5.0018
23.9758 123.0 1599 5.0062
23.9645 124.0 1612 5.0189
23.7556 125.0 1625 5.0335
23.6545 126.0 1638 5.0391
23.6494 127.0 1651 5.0451
23.5182 128.0 1664 5.0568
23.4305 129.0 1677 5.0735
23.2204 130.0 1690 5.0867
23.2421 131.0 1703 5.0943
23.2145 132.0 1716 5.1013
23.208 133.0 1729 5.1090
23.0669 134.0 1742 5.1194
22.8859 135.0 1755 5.1288
22.8141 136.0 1768 5.1362
22.8662 137.0 1781 5.1489
22.7264 138.0 1794 5.1542
22.5898 139.0 1807 5.1642
22.6557 140.0 1820 5.1749
22.5124 141.0 1833 5.1820
22.3406 142.0 1846 5.1939
22.2505 143.0 1859 5.2026
22.2204 144.0 1872 5.2083
22.2833 145.0 1885 5.2155
22.147 146.0 1898 5.2226
21.9469 147.0 1911 5.2352
21.9386 148.0 1924 5.2402
21.872 149.0 1937 5.2445
21.8451 150.0 1950 5.2524
21.7768 151.0 1963 5.2608
21.8036 152.0 1976 5.2719
21.6397 153.0 1989 5.2794
21.6209 154.0 2002 5.2828
21.4504 155.0 2015 5.2919
21.513 156.0 2028 5.2945
21.3786 157.0 2041 5.2997
21.3505 158.0 2054 5.3078
21.2884 159.0 2067 5.3166
21.2095 160.0 2080 5.3258
21.1847 161.0 2093 5.3304
21.1161 162.0 2106 5.3315
21.1981 163.0 2119 5.3417
21.0559 164.0 2132 5.3458
20.9775 165.0 2145 5.3557
20.8902 166.0 2158 5.3575
20.8739 167.0 2171 5.3686
20.8402 168.0 2184 5.3714
20.7933 169.0 2197 5.3733
20.8702 170.0 2210 5.3808
20.6881 171.0 2223 5.3834
20.6678 172.0 2236 5.3896
20.6442 173.0 2249 5.3935
20.5788 174.0 2262 5.3987
20.6322 175.0 2275 5.4044
20.5513 176.0 2288 5.4058
20.5092 177.0 2301 5.4160
20.4759 178.0 2314 5.4203
20.382 179.0 2327 5.4229
20.3942 180.0 2340 5.4260
20.2126 181.0 2353 5.4334
20.2189 182.0 2366 5.4310
20.1824 183.0 2379 5.4377
20.3083 184.0 2392 5.4380
20.116 185.0 2405 5.4405
20.2211 186.0 2418 5.4468
20.2836 187.0 2431 5.4510
20.0954 188.0 2444 5.4528
20.1147 189.0 2457 5.4571
20.1607 190.0 2470 5.4591
20.161 191.0 2483 5.4616
20.0482 192.0 2496 5.4660
20.023 193.0 2509 5.4673
20.0252 194.0 2522 5.4720
19.9898 195.0 2535 5.4727
20.0842 196.0 2548 5.4745
19.9716 197.0 2561 5.4782
19.9702 198.0 2574 5.4773
19.9797 199.0 2587 5.4808
19.9872 200.0 2600 5.4823
19.9269 201.0 2613 5.4867
19.8922 202.0 2626 5.4861
19.8691 203.0 2639 5.4875
19.9356 204.0 2652 5.4870
19.7844 205.0 2665 5.4903
19.8478 206.0 2678 5.4917
19.8299 207.0 2691 5.4930
19.7174 208.0 2704 5.4931
19.7498 209.0 2717 5.4965
19.7161 210.0 2730 5.4964
19.6239 211.0 2743 5.4974
19.843 212.0 2756 5.4979
19.8373 213.0 2769 5.4983
19.6958 214.0 2782 5.4980
19.8598 215.0 2795 5.5000
19.6143 216.0 2808 5.5011
19.7373 217.0 2821 5.5003
19.681 218.0 2834 5.5021
19.7768 219.0 2847 5.5010
19.6681 220.0 2860 5.5016
19.6948 221.0 2873 5.5028
19.6948 222.0 2886 5.5022
19.6591 223.0 2899 5.5025
19.6224 224.0 2912 5.5025
19.7192 225.0 2925 5.5030
19.6382 226.0 2938 5.5028
19.6482 227.0 2951 5.5027
19.7418 228.0 2964 5.5028
19.7149 229.0 2977 5.5029
19.6127 230.0 2990 5.5029
19.7064 230.7692 3000 5.5029

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-french-mirror-reversal