impossible-llms-french-fronting-bigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.7228

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
79.3638 0.9412 12 9.7053
72.7332 1.9412 24 8.9806
68.7253 2.9412 36 8.4845
65.466 3.9412 48 8.1040
62.7574 4.9412 60 7.7614
59.6826 5.9412 72 7.4294
57.255 6.9412 84 7.0866
54.3365 7.9412 96 6.7286
51.5509 8.9412 108 6.3944
49.139 9.9412 120 6.0900
47.3971 10.9412 132 5.8603
45.8372 11.9412 144 5.7262
45.1168 12.9412 156 5.6438
44.5466 13.9412 168 5.5803
44.2595 14.9412 180 5.5257
43.5834 15.9412 192 5.4881
43.9145 16.9412 204 5.4543
42.9936 17.9412 216 5.4226
43.0455 18.9412 228 5.3785
42.7153 19.9412 240 5.3556
42.3413 20.9412 252 5.3213
42.307 21.9412 264 5.2894
41.9037 22.9412 276 5.2577
41.5448 23.9412 288 5.2356
41.4459 24.9412 300 5.2112
40.8626 25.9412 312 5.1827
41.004 26.9412 324 5.1599
40.588 27.9412 336 5.1265
39.9792 28.9412 348 5.1023
40.1876 29.9412 360 5.0754
40.2066 30.9412 372 5.0545
39.6226 31.9412 384 5.0295
39.6474 32.9412 396 5.0010
39.2219 33.9412 408 4.9767
39.335 34.9412 420 4.9552
38.95 35.9412 432 4.9356
38.3854 36.9412 444 4.9105
38.5026 37.9412 456 4.8881
38.3749 38.9412 468 4.8698
37.9279 39.9412 480 4.8515
37.976 40.9412 492 4.8338
37.4433 41.9412 504 4.8151
37.4726 42.9412 516 4.8001
37.402 43.9412 528 4.7840
37.2191 44.9412 540 4.7649
37.0974 45.9412 552 4.7477
36.8227 46.9412 564 4.7360
36.7461 47.9412 576 4.7238
36.4487 48.9412 588 4.7104
36.2078 49.9412 600 4.6986
36.1915 50.9412 612 4.6926
35.8194 51.9412 624 4.6761
35.7819 52.9412 636 4.6695
35.2404 53.9412 648 4.6667
35.8045 54.9412 660 4.6527
35.2655 55.9412 672 4.6506
35.0667 56.9412 684 4.6414
35.2667 57.9412 696 4.6386
35.017 58.9412 708 4.6305
34.8225 59.9412 720 4.6193
34.5411 60.9412 732 4.6226
34.4989 61.9412 744 4.6198
34.344 62.9412 756 4.6145
34.1108 63.9412 768 4.6180
33.9328 64.9412 780 4.6055
33.8793 65.9412 792 4.6116
33.9447 66.9412 804 4.6077
33.7084 67.9412 816 4.6114
33.3995 68.9412 828 4.6089
33.2015 69.9412 840 4.6083
33.1283 70.9412 852 4.6056
32.8073 71.9412 864 4.6127
32.7656 72.9412 876 4.6181
32.8242 73.9412 888 4.6176
32.3365 74.9412 900 4.6272
32.1099 75.9412 912 4.6277
32.311 76.9412 924 4.6325
32.2731 77.9412 936 4.6344
32.003 78.9412 948 4.6399
31.9208 79.9412 960 4.6456
31.6538 80.9412 972 4.6514
31.568 81.9412 984 4.6606
31.4382 82.9412 996 4.6625
31.2878 83.9412 1008 4.6732
30.873 84.9412 1020 4.6823
30.8787 85.9412 1032 4.6894
30.7165 86.9412 1044 4.6912
30.5921 87.9412 1056 4.6992
30.8143 88.9412 1068 4.7139
30.5748 89.9412 1080 4.7246
30.1792 90.9412 1092 4.7335
29.8887 91.9412 1104 4.7427
30.0604 92.9412 1116 4.7466
30.0204 93.9412 1128 4.7574
30.0081 94.9412 1140 4.7746
29.5375 95.9412 1152 4.7775
29.3867 96.9412 1164 4.7923
28.9751 97.9412 1176 4.7936
29.0835 98.9412 1188 4.8100
29.0994 99.9412 1200 4.8170
29.0009 100.9412 1212 4.8260
28.7146 101.9412 1224 4.8470
28.5965 102.9412 1236 4.8544
28.6353 103.9412 1248 4.8593
28.205 104.9412 1260 4.8768
28.1108 105.9412 1272 4.8952
28.0881 106.9412 1284 4.9025
28.0538 107.9412 1296 4.9068
27.9017 108.9412 1308 4.9150
27.6376 109.9412 1320 4.9353
27.6416 110.9412 1332 4.9469
27.5867 111.9412 1344 4.9547
27.3376 112.9412 1356 4.9602
27.3628 113.9412 1368 4.9710
27.1877 114.9412 1380 4.9861
26.7264 115.9412 1392 4.9937
26.9175 116.9412 1404 5.0074
26.6785 117.9412 1416 5.0188
26.5621 118.9412 1428 5.0318
26.6092 119.9412 1440 5.0444
26.3616 120.9412 1452 5.0541
26.2821 121.9412 1464 5.0668
26.2052 122.9412 1476 5.0814
26.213 123.9412 1488 5.0853
25.9622 124.9412 1500 5.1022
25.9021 125.9412 1512 5.1077
25.7813 126.9412 1524 5.1170
25.5874 127.9412 1536 5.1269
25.4117 128.9412 1548 5.1411
25.4952 129.9412 1560 5.1624
25.3666 130.9412 1572 5.1609
25.2289 131.9412 1584 5.1728
25.1949 132.9412 1596 5.1823
25.1165 133.9412 1608 5.1952
24.9444 134.9412 1620 5.2040
24.8856 135.9412 1632 5.2116
24.7444 136.9412 1644 5.2232
24.7497 137.9412 1656 5.2349
24.7869 138.9412 1668 5.2486
24.3569 139.9412 1680 5.2533
24.3997 140.9412 1692 5.2605
24.3834 141.9412 1704 5.2691
24.0957 142.9412 1716 5.2804
24.131 143.9412 1728 5.2930
24.039 144.9412 1740 5.3034
23.9904 145.9412 1752 5.3122
23.8007 146.9412 1764 5.3210
23.7843 147.9412 1776 5.3281
23.7926 148.9412 1788 5.3347
23.4625 149.9412 1800 5.3482
23.5426 150.9412 1812 5.3527
23.4172 151.9412 1824 5.3705
23.2591 152.9412 1836 5.3743
23.3679 153.9412 1848 5.3804
23.3028 154.9412 1860 5.3898
23.2599 155.9412 1872 5.3972
23.1016 156.9412 1884 5.4080
22.9727 157.9412 1896 5.4138
23.1486 158.9412 1908 5.4240
22.9229 159.9412 1920 5.4380
22.6859 160.9412 1932 5.4470
22.8275 161.9412 1944 5.4500
22.7194 162.9412 1956 5.4544
22.5244 163.9412 1968 5.4649
22.5588 164.9412 1980 5.4664
22.4303 165.9412 1992 5.4799
22.4253 166.9412 2004 5.4794
22.4683 167.9412 2016 5.4920
22.3942 168.9412 2028 5.4977
22.275 169.9412 2040 5.5009
22.3133 170.9412 2052 5.5124
22.0386 171.9412 2064 5.5192
22.1633 172.9412 2076 5.5229
22.082 173.9412 2088 5.5312
21.8905 174.9412 2100 5.5384
21.993 175.9412 2112 5.5460
21.9738 176.9412 2124 5.5444
21.8511 177.9412 2136 5.5582
21.7059 178.9412 2148 5.5566
21.8077 179.9412 2160 5.5691
21.7206 180.9412 2172 5.5743
21.8078 181.9412 2184 5.5766
21.611 182.9412 2196 5.5795
21.5848 183.9412 2208 5.5844
21.5887 184.9412 2220 5.5927
21.3867 185.9412 2232 5.5954
21.614 186.9412 2244 5.6014
21.577 187.9412 2256 5.6093
21.4414 188.9412 2268 5.6056
21.3891 189.9412 2280 5.6108
21.3946 190.9412 2292 5.6237
21.3329 191.9412 2304 5.6233
21.3753 192.9412 2316 5.6280
21.2187 193.9412 2328 5.6353
21.1449 194.9412 2340 5.6386
21.1833 195.9412 2352 5.6405
21.1162 196.9412 2364 5.6493
20.9992 197.9412 2376 5.6480
21.1088 198.9412 2388 5.6479
21.1226 199.9412 2400 5.6558
21.0518 200.9412 2412 5.6574
20.9664 201.9412 2424 5.6625
20.8954 202.9412 2436 5.6597
20.9799 203.9412 2448 5.6710
20.9788 204.9412 2460 5.6727
20.8937 205.9412 2472 5.6750
20.8753 206.9412 2484 5.6830
20.8515 207.9412 2496 5.6793
20.8154 208.9412 2508 5.6825
20.8392 209.9412 2520 5.6849
20.745 210.9412 2532 5.6868
20.6085 211.9412 2544 5.6885
20.8018 212.9412 2556 5.6913
20.722 213.9412 2568 5.6960
20.658 214.9412 2580 5.6967
20.6702 215.9412 2592 5.6970
20.6859 216.9412 2604 5.6991
20.8232 217.9412 2616 5.7016
20.6151 218.9412 2628 5.7001
20.6224 219.9412 2640 5.7048
20.7091 220.9412 2652 5.7059
20.594 221.9412 2664 5.7070
20.5432 222.9412 2676 5.7090
20.6758 223.9412 2688 5.7107
20.5775 224.9412 2700 5.7117
20.6193 225.9412 2712 5.7139
20.5916 226.9412 2724 5.7143
20.5913 227.9412 2736 5.7156
20.4992 228.9412 2748 5.7164
20.4174 229.9412 2760 5.7175
20.5886 230.9412 2772 5.7181
20.526 231.9412 2784 5.7173
20.4712 232.9412 2796 5.7195
20.6709 233.9412 2808 5.7192
20.5925 234.9412 2820 5.7209
20.5312 235.9412 2832 5.7205
20.5349 236.9412 2844 5.7213
20.4508 237.9412 2856 5.7212
20.4562 238.9412 2868 5.7210
20.5256 239.9412 2880 5.7222
20.4871 240.9412 2892 5.7226
20.3434 241.9412 2904 5.7222
20.4407 242.9412 2916 5.7224
20.4709 243.9412 2928 5.7224
20.3969 244.9412 2940 5.7223
20.4623 245.9412 2952 5.7226
20.4695 246.9412 2964 5.7228
20.4002 247.9412 2976 5.7229
20.5512 248.9412 2988 5.7228
20.5067 249.9412 3000 5.7228

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-french-fronting-bigram