impossible-llms-french-random-trigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5790

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
19.5388 1.0 14 9.5505
17.9401 2.0 28 8.8721
16.7452 3.0 42 8.3110
15.9672 4.0 56 7.9213
15.2666 5.0 70 7.5166
14.221 6.0 84 7.0987
13.4173 7.0 98 6.6794
12.696 8.0 112 6.2922
12.0732 9.0 126 5.9727
11.5407 10.0 140 5.7545
11.1373 11.0 154 5.6278
11.0415 12.0 168 5.5557
11.0857 13.0 182 5.4994
10.8756 14.0 196 5.4403
10.7797 15.0 210 5.3880
10.8094 16.0 224 5.3437
10.545 17.0 238 5.3122
10.4286 18.0 252 5.2743
10.4704 19.0 266 5.2344
10.4322 20.0 280 5.1965
10.3955 21.0 294 5.1627
10.1824 22.0 308 5.1350
10.3351 23.0 322 5.0994
10.0999 24.0 336 5.0613
9.9475 25.0 350 5.0338
10.0076 26.0 364 5.0021
9.8902 27.0 378 4.9742
9.7566 28.0 392 4.9420
9.5947 29.0 406 4.9075
9.5567 30.0 420 4.8829
9.6307 31.0 434 4.8543
9.5312 32.0 448 4.8264
9.631 33.0 462 4.8035
9.374 34.0 476 4.7773
9.2836 35.0 490 4.7586
8.9454 36.0 504 4.7365
9.245 37.0 518 4.7158
9.3739 38.0 532 4.7004
9.1333 39.0 546 4.6807
9.1148 40.0 560 4.6652
9.0499 41.0 574 4.6520
9.1336 42.0 588 4.6394
8.8996 43.0 602 4.6258
8.9487 44.0 616 4.6114
8.7108 45.0 630 4.6009
8.8296 46.0 644 4.5939
8.8617 47.0 658 4.5823
8.6744 48.0 672 4.5743
8.6516 49.0 686 4.5677
8.7278 50.0 700 4.5603
8.754 51.0 714 4.5555
8.5846 52.0 728 4.5513
8.6258 53.0 742 4.5455
8.5361 54.0 756 4.5430
8.301 55.0 770 4.5423
8.461 56.0 784 4.5390
8.3438 57.0 798 4.5387
8.3601 58.0 812 4.5399
8.2845 59.0 826 4.5374
8.3098 60.0 840 4.5390
8.1923 61.0 854 4.5396
8.188 62.0 868 4.5458
8.2348 63.0 882 4.5458
8.0945 64.0 896 4.5509
8.0231 65.0 910 4.5545
8.1322 66.0 924 4.5577
8.0243 67.0 938 4.5633
7.967 68.0 952 4.5667
7.8999 69.0 966 4.5763
7.8098 70.0 980 4.5799
7.8359 71.0 994 4.5920
7.8627 72.0 1008 4.5911
7.7559 73.0 1022 4.6069
7.753 74.0 1036 4.6096
7.7662 75.0 1050 4.6219
7.6475 76.0 1064 4.6295
7.4705 77.0 1078 4.6425
7.5925 78.0 1092 4.6432
7.5229 79.0 1106 4.6607
7.5707 80.0 1120 4.6689
7.4744 81.0 1134 4.6762
7.4192 82.0 1148 4.6899
7.3259 83.0 1162 4.6976
7.3084 84.0 1176 4.7109
7.3203 85.0 1190 4.7242
7.1939 86.0 1204 4.7307
7.1368 87.0 1218 4.7503
7.2996 88.0 1232 4.7580
7.0555 89.0 1246 4.7690
7.1743 90.0 1260 4.7812
7.0033 91.0 1274 4.7911
7.0944 92.0 1288 4.8034
6.893 93.0 1302 4.8147
6.9475 94.0 1316 4.8248
7.0015 95.0 1330 4.8375
6.8675 96.0 1344 4.8429
6.8802 97.0 1358 4.8640
6.884 98.0 1372 4.8785
6.7255 99.0 1386 4.8808
6.7309 100.0 1400 4.8979
6.7191 101.0 1414 4.9133
6.7445 102.0 1428 4.9165
6.7745 103.0 1442 4.9349
6.5724 104.0 1456 4.9463
6.6491 105.0 1470 4.9572
6.5131 106.0 1484 4.9676
6.6026 107.0 1498 4.9795
6.4379 108.0 1512 4.9951
6.4879 109.0 1526 5.0042
6.5413 110.0 1540 5.0177
6.41 111.0 1554 5.0247
6.4405 112.0 1568 5.0434
6.3561 113.0 1582 5.0554
6.364 114.0 1596 5.0739
6.3571 115.0 1610 5.0756
6.2677 116.0 1624 5.0883
6.2884 117.0 1638 5.0951
6.3058 118.0 1652 5.1093
6.266 119.0 1666 5.1246
6.2004 120.0 1680 5.1309
6.2179 121.0 1694 5.1415
6.1123 122.0 1708 5.1508
6.0874 123.0 1722 5.1689
6.1809 124.0 1736 5.1704
6.1127 125.0 1750 5.1792
5.995 126.0 1764 5.1942
6.0025 127.0 1778 5.2078
5.9974 128.0 1792 5.2143
6.0356 129.0 1806 5.2271
5.9101 130.0 1820 5.2342
6.067 131.0 1834 5.2491
5.9895 132.0 1848 5.2499
5.9256 133.0 1862 5.2619
5.9277 134.0 1876 5.2724
5.8285 135.0 1890 5.2761
5.8923 136.0 1904 5.2850
5.8769 137.0 1918 5.2900
5.8877 138.0 1932 5.3080
5.8414 139.0 1946 5.3127
5.8333 140.0 1960 5.3192
5.8625 141.0 1974 5.3270
5.7775 142.0 1988 5.3342
5.784 143.0 2002 5.3460
5.6887 144.0 2016 5.3503
5.7328 145.0 2030 5.3657
5.6659 146.0 2044 5.3597
5.6671 147.0 2058 5.3763
5.682 148.0 2072 5.3856
5.6351 149.0 2086 5.3962
5.6191 150.0 2100 5.3962
5.623 151.0 2114 5.4065
5.6142 152.0 2128 5.4114
5.5707 153.0 2142 5.4184
5.5349 154.0 2156 5.4250
5.5473 155.0 2170 5.4287
5.5398 156.0 2184 5.4351
5.5673 157.0 2198 5.4374
5.5278 158.0 2212 5.4449
5.5364 159.0 2226 5.4522
5.5453 160.0 2240 5.4590
5.5011 161.0 2254 5.4665
5.4966 162.0 2268 5.4745
5.4564 163.0 2282 5.4779
5.479 164.0 2296 5.4831
5.4074 165.0 2310 5.4861
5.4466 166.0 2324 5.4911
5.4323 167.0 2338 5.4945
5.4185 168.0 2352 5.4986
5.4081 169.0 2366 5.5008
5.4454 170.0 2380 5.5081
5.3599 171.0 2394 5.5100
5.3657 172.0 2408 5.5110
5.3012 173.0 2422 5.5204
5.3654 174.0 2436 5.5213
5.3618 175.0 2450 5.5238
5.3746 176.0 2464 5.5344
5.348 177.0 2478 5.5321
5.3121 178.0 2492 5.5341
5.3438 179.0 2506 5.5416
5.3456 180.0 2520 5.5440
5.3452 181.0 2534 5.5471
5.302 182.0 2548 5.5489
5.3466 183.0 2562 5.5487
5.3351 184.0 2576 5.5524
5.292 185.0 2590 5.5547
5.3786 186.0 2604 5.5566
5.2728 187.0 2618 5.5592
5.3039 188.0 2632 5.5618
5.2933 189.0 2646 5.5623
5.2462 190.0 2660 5.5642
5.2955 191.0 2674 5.5672
5.3091 192.0 2688 5.5677
5.2952 193.0 2702 5.5695
5.2556 194.0 2716 5.5697
5.256 195.0 2730 5.5697
5.2803 196.0 2744 5.5723
5.2666 197.0 2758 5.5742
5.282 198.0 2772 5.5750
5.2596 199.0 2786 5.5758
5.242 200.0 2800 5.5766
5.3405 201.0 2814 5.5762
5.3114 202.0 2828 5.5767
5.2631 203.0 2842 5.5775
5.2528 204.0 2856 5.5776
5.2509 205.0 2870 5.5780
5.3044 206.0 2884 5.5782
5.2655 207.0 2898 5.5781
5.2337 208.0 2912 5.5785
5.2294 209.0 2926 5.5790
5.25 210.0 2940 5.5787
5.2545 211.0 2954 5.5788
5.2359 212.0 2968 5.5789
5.2535 213.0 2982 5.5790
5.2237 214.0 2996 5.5790
21.0264 214.3019 3000 5.5790

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-french-random-trigram