impossible-llms-french-random-fourgram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.7709

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
19.3413 1.0 14 9.5560
17.8282 2.0 28 8.8858
16.7865 3.0 42 8.3290
16.2985 4.0 56 7.9541
15.2854 5.0 70 7.5663
14.1165 6.0 84 7.1540
13.6 7.0 98 6.7456
12.6682 8.0 112 6.3757
12.1537 9.0 126 6.0589
11.8659 10.0 140 5.8500
11.4423 11.0 154 5.7267
11.2484 12.0 168 5.6468
11.2254 13.0 182 5.5859
11.0824 14.0 196 5.5306
10.9985 15.0 210 5.4810
10.7426 16.0 224 5.4386
10.7897 17.0 238 5.4003
10.6788 18.0 252 5.3649
10.6826 19.0 266 5.3284
10.5997 20.0 280 5.2927
10.6804 21.0 294 5.2608
10.4569 22.0 308 5.2348
10.3926 23.0 322 5.1974
10.4329 24.0 336 5.1658
10.2911 25.0 350 5.1354
10.137 26.0 364 5.1093
9.9448 27.0 378 5.0777
10.1379 28.0 392 5.0526
9.9001 29.0 406 5.0275
9.9793 30.0 420 5.0011
9.877 31.0 434 4.9731
9.7064 32.0 448 4.9436
9.7728 33.0 462 4.9232
9.7954 34.0 476 4.9039
9.7143 35.0 490 4.8785
9.5204 36.0 504 4.8604
9.5834 37.0 518 4.8411
9.5114 38.0 532 4.8253
9.4687 39.0 546 4.8085
9.5096 40.0 560 4.7967
9.3579 41.0 574 4.7805
9.3129 42.0 588 4.7687
9.2536 43.0 602 4.7568
9.2814 44.0 616 4.7451
9.0799 45.0 630 4.7389
9.117 46.0 644 4.7261
9.1623 47.0 658 4.7200
9.113 48.0 672 4.7107
8.8764 49.0 686 4.7052
8.9128 50.0 700 4.7027
8.9086 51.0 714 4.6936
8.9187 52.0 728 4.6915
8.7324 53.0 742 4.6900
8.7402 54.0 756 4.6837
8.7481 55.0 770 4.6854
8.7484 56.0 784 4.6828
8.7518 57.0 798 4.6822
8.5725 58.0 812 4.6833
8.4755 59.0 826 4.6840
8.4294 60.0 840 4.6829
8.4633 61.0 854 4.6888
8.4685 62.0 868 4.6929
8.3212 63.0 882 4.6926
8.387 64.0 896 4.6999
8.3862 65.0 910 4.7042
8.2601 66.0 924 4.7073
8.2653 67.0 938 4.7148
8.2147 68.0 952 4.7240
8.1959 69.0 966 4.7297
8.1291 70.0 980 4.7366
8.0387 71.0 994 4.7429
8.1311 72.0 1008 4.7492
7.9853 73.0 1022 4.7598
7.9861 74.0 1036 4.7649
7.917 75.0 1050 4.7758
7.8244 76.0 1064 4.7903
7.818 77.0 1078 4.7944
7.7401 78.0 1092 4.8060
7.8455 79.0 1106 4.8160
7.8481 80.0 1120 4.8232
7.755 81.0 1134 4.8323
7.7421 82.0 1148 4.8490
7.605 83.0 1162 4.8602
7.6062 84.0 1176 4.8701
7.5694 85.0 1190 4.8825
7.5111 86.0 1204 4.8959
7.5136 87.0 1218 4.9056
7.3802 88.0 1232 4.9169
7.4024 89.0 1246 4.9302
7.4278 90.0 1260 4.9438
7.2473 91.0 1274 4.9553
7.1056 92.0 1288 4.9698
7.3367 93.0 1302 4.9804
7.2153 94.0 1316 4.9915
7.1236 95.0 1330 5.0020
7.238 96.0 1344 5.0151
7.1268 97.0 1358 5.0281
7.0722 98.0 1372 5.0430
6.9907 99.0 1386 5.0533
7.0275 100.0 1400 5.0652
7.1082 101.0 1414 5.0770
7.0626 102.0 1428 5.0962
6.9378 103.0 1442 5.1016
6.896 104.0 1456 5.1189
6.864 105.0 1470 5.1320
6.8943 106.0 1484 5.1349
6.8454 107.0 1498 5.1519
6.7281 108.0 1512 5.1649
6.7745 109.0 1526 5.1831
6.5361 110.0 1540 5.1951
6.6865 111.0 1554 5.1988
6.6242 112.0 1568 5.2155
6.6225 113.0 1582 5.2278
6.5798 114.0 1596 5.2335
6.556 115.0 1610 5.2532
6.5604 116.0 1624 5.2645
6.4749 117.0 1638 5.2743
6.4891 118.0 1652 5.2869
6.4335 119.0 1666 5.2986
6.5114 120.0 1680 5.3109
6.4408 121.0 1694 5.3212
6.3667 122.0 1708 5.3298
6.3584 123.0 1722 5.3408
6.2831 124.0 1736 5.3542
6.3055 125.0 1750 5.3632
6.3451 126.0 1764 5.3695
6.2636 127.0 1778 5.3902
6.1909 128.0 1792 5.3946
6.1821 129.0 1806 5.4046
6.2121 130.0 1820 5.4137
6.2157 131.0 1834 5.4222
6.2115 132.0 1848 5.4285
6.1631 133.0 1862 5.4377
6.1074 134.0 1876 5.4495
6.0796 135.0 1890 5.4648
6.0416 136.0 1904 5.4746
6.1123 137.0 1918 5.4801
5.9995 138.0 1932 5.4863
6.0616 139.0 1946 5.4929
6.0098 140.0 1960 5.4995
5.9556 141.0 1974 5.5076
5.9591 142.0 1988 5.5204
5.9247 143.0 2002 5.5277
5.9409 144.0 2016 5.5407
5.9081 145.0 2030 5.5536
5.8869 146.0 2044 5.5602
5.9413 147.0 2058 5.5636
5.863 148.0 2072 5.5717
5.8174 149.0 2086 5.5750
5.7999 150.0 2100 5.5852
5.824 151.0 2114 5.5900
5.8427 152.0 2128 5.5987
5.6974 153.0 2142 5.6064
5.7389 154.0 2156 5.6120
5.773 155.0 2170 5.6140
5.7372 156.0 2184 5.6244
5.6788 157.0 2198 5.6267
5.694 158.0 2212 5.6346
5.6659 159.0 2226 5.6387
5.6601 160.0 2240 5.6455
5.7282 161.0 2254 5.6535
5.6995 162.0 2268 5.6572
5.6779 163.0 2282 5.6608
5.5655 164.0 2296 5.6728
5.6528 165.0 2310 5.6711
5.6853 166.0 2324 5.6748
5.6575 167.0 2338 5.6860
5.6327 168.0 2352 5.6873
5.6477 169.0 2366 5.6922
5.57 170.0 2380 5.6931
5.6212 171.0 2394 5.6994
5.5344 172.0 2408 5.7095
5.608 173.0 2422 5.7115
5.6274 174.0 2436 5.7163
5.5226 175.0 2450 5.7169
5.6039 176.0 2464 5.7195
5.5918 177.0 2478 5.7207
5.521 178.0 2492 5.7263
5.5004 179.0 2506 5.7269
5.5553 180.0 2520 5.7342
5.5396 181.0 2534 5.7351
5.5434 182.0 2548 5.7390
5.4705 183.0 2562 5.7413
5.515 184.0 2576 5.7436
5.5378 185.0 2590 5.7429
5.5125 186.0 2604 5.7467
5.5241 187.0 2618 5.7491
5.4869 188.0 2632 5.7518
5.492 189.0 2646 5.7538
5.5174 190.0 2660 5.7542
5.4813 191.0 2674 5.7575
5.4454 192.0 2688 5.7589
5.4896 193.0 2702 5.7597
5.3964 194.0 2716 5.7616
5.4764 195.0 2730 5.7630
5.4792 196.0 2744 5.7635
5.3841 197.0 2758 5.7653
5.4504 198.0 2772 5.7665
5.433 199.0 2786 5.7655
5.4426 200.0 2800 5.7659
5.4752 201.0 2814 5.7679
5.432 202.0 2828 5.7679
5.4162 203.0 2842 5.7692
5.4561 204.0 2856 5.7694
5.3887 205.0 2870 5.7700
5.4423 206.0 2884 5.7700
5.4008 207.0 2898 5.7699
5.4596 208.0 2912 5.7705
5.3616 209.0 2926 5.7706
5.3832 210.0 2940 5.7709
5.4724 211.0 2954 5.7709
5.4122 212.0 2968 5.7709
5.3969 213.0 2982 5.7709
5.4304 214.0 2996 5.7709
21.836 214.3019 3000 5.7709

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-french-random-fourgram