impossible-llms-dutch-random-trigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.1767

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
82.5994 1.0 8 10.0542
75.3584 2.0 16 9.2715
71.8298 3.0 24 8.8846
69.7388 4.0 32 8.6622
67.9059 5.0 40 8.4515
65.8943 6.0 48 8.2601
64.9371 7.0 56 8.0639
64.1393 8.0 64 7.8508
61.5761 9.0 72 7.6357
60.1545 10.0 80 7.4159
58.5955 11.0 88 7.1999
56.5156 12.0 96 6.9792
54.4413 13.0 104 6.7578
52.4555 14.0 112 6.5518
51.4514 15.0 120 6.3584
50.1681 16.0 128 6.2014
48.6725 17.0 136 6.0718
48.0363 18.0 144 6.0036
47.4449 19.0 152 5.9306
47.7943 20.0 160 5.8906
46.7669 21.0 168 5.8501
46.9906 22.0 176 5.8311
46.7802 23.0 184 5.8032
46.3167 24.0 192 5.7732
46.1253 25.0 200 5.7465
46.1639 26.0 208 5.7233
45.7313 27.0 216 5.7011
44.8697 28.0 224 5.6811
44.9405 29.0 232 5.6618
44.7098 30.0 240 5.6458
44.5541 31.0 248 5.6263
44.9985 32.0 256 5.6188
44.7164 33.0 264 5.6017
44.3521 34.0 272 5.5895
44.1616 35.0 280 5.5672
44.4745 36.0 288 5.5586
43.884 37.0 296 5.5509
44.041 38.0 304 5.5372
43.8014 39.0 312 5.5197
43.5336 40.0 320 5.5044
43.8563 41.0 328 5.4919
43.3208 42.0 336 5.4814
42.4841 43.0 344 5.4666
42.682 44.0 352 5.4504
42.5942 45.0 360 5.4392
42.6471 46.0 368 5.4281
42.0434 47.0 376 5.4126
42.0182 48.0 384 5.3987
41.2785 49.0 392 5.3823
41.9833 50.0 400 5.3747
41.2029 51.0 408 5.3579
41.4731 52.0 416 5.3561
41.0419 53.0 424 5.3445
41.1785 54.0 432 5.3324
40.6796 55.0 440 5.3220
40.8005 56.0 448 5.3085
40.8224 57.0 456 5.3090
40.7455 58.0 464 5.2966
40.1623 59.0 472 5.2913
39.9811 60.0 480 5.2814
39.9838 61.0 488 5.2715
39.6819 62.0 496 5.2678
39.9066 63.0 504 5.2580
39.4079 64.0 512 5.2548
39.2164 65.0 520 5.2503
39.5161 66.0 528 5.2464
39.2483 67.0 536 5.2417
38.9285 68.0 544 5.2479
38.82 69.0 552 5.2428
38.8687 70.0 560 5.2381
38.5354 71.0 568 5.2322
37.9785 72.0 576 5.2324
38.2662 73.0 584 5.2372
37.8792 74.0 592 5.2330
37.8011 75.0 600 5.2385
37.5622 76.0 608 5.2393
37.4363 77.0 616 5.2367
37.2311 78.0 624 5.2437
37.388 79.0 632 5.2430
36.7787 80.0 640 5.2490
36.752 81.0 648 5.2609
36.6417 82.0 656 5.2538
36.3101 83.0 664 5.2602
36.1801 84.0 672 5.2574
36.2202 85.0 680 5.2636
35.8867 86.0 688 5.2841
35.3909 87.0 696 5.2803
35.5727 88.0 704 5.2949
35.7404 89.0 712 5.2919
35.3867 90.0 720 5.3118
35.3958 91.0 728 5.3114
34.9035 92.0 736 5.3233
34.8338 93.0 744 5.3217
34.5983 94.0 752 5.3424
34.6717 95.0 760 5.3457
34.0202 96.0 768 5.3570
34.194 97.0 776 5.3590
33.7407 98.0 784 5.3684
33.6082 99.0 792 5.3842
33.8825 100.0 800 5.3972
33.5653 101.0 808 5.4093
33.0536 102.0 816 5.4197
33.2688 103.0 824 5.4239
32.8215 104.0 832 5.4438
32.8538 105.0 840 5.4522
32.3872 106.0 848 5.4627
32.6686 107.0 856 5.4723
32.5385 108.0 864 5.4829
32.0684 109.0 872 5.5106
32.2504 110.0 880 5.5188
32.0645 111.0 888 5.5214
31.6109 112.0 896 5.5390
31.5912 113.0 904 5.5493
31.2763 114.0 912 5.5714
31.3219 115.0 920 5.5731
31.103 116.0 928 5.5915
30.8593 117.0 936 5.6075
30.883 118.0 944 5.6222
30.5909 119.0 952 5.6391
30.0987 120.0 960 5.6483
30.4074 121.0 968 5.6613
30.1413 122.0 976 5.6761
29.9734 123.0 984 5.6864
29.761 124.0 992 5.6941
29.7224 125.0 1000 5.7183
29.5969 126.0 1008 5.7312
29.4819 127.0 1016 5.7498
29.122 128.0 1024 5.7577
28.8915 129.0 1032 5.7759
28.9803 130.0 1040 5.7829
28.7822 131.0 1048 5.7881
28.6867 132.0 1056 5.8079
28.5127 133.0 1064 5.8202
28.2518 134.0 1072 5.8397
27.9477 135.0 1080 5.8579
27.9133 136.0 1088 5.8667
27.9604 137.0 1096 5.8785
27.6479 138.0 1104 5.8923
27.4398 139.0 1112 5.8948
27.4453 140.0 1120 5.9274
27.107 141.0 1128 5.9330
27.1592 142.0 1136 5.9402
26.7765 143.0 1144 5.9617
26.7436 144.0 1152 5.9688
26.4797 145.0 1160 5.9934
26.6271 146.0 1168 5.9979
26.5695 147.0 1176 6.0227
26.2278 148.0 1184 6.0268
26.3147 149.0 1192 6.0485
26.0386 150.0 1200 6.0496
25.9994 151.0 1208 6.0736
25.6954 152.0 1216 6.0737
25.6808 153.0 1224 6.0932
25.5726 154.0 1232 6.1188
25.5548 155.0 1240 6.1280
25.3248 156.0 1248 6.1349
25.0167 157.0 1256 6.1471
24.9439 158.0 1264 6.1657
24.9627 159.0 1272 6.1655
24.797 160.0 1280 6.1841
24.8176 161.0 1288 6.1916
24.4445 162.0 1296 6.2120
24.4471 163.0 1304 6.2158
24.4066 164.0 1312 6.2300
24.1849 165.0 1320 6.2481
24.2606 166.0 1328 6.2574
24.1559 167.0 1336 6.2774
23.8622 168.0 1344 6.2832
23.7267 169.0 1352 6.2942
23.586 170.0 1360 6.3067
23.5871 171.0 1368 6.3185
23.3116 172.0 1376 6.3322
23.3358 173.0 1384 6.3359
23.2364 174.0 1392 6.3462
23.2253 175.0 1400 6.3604
23.0764 176.0 1408 6.3661
22.9777 177.0 1416 6.3790
22.843 178.0 1424 6.3884
22.7189 179.0 1432 6.4069
22.7373 180.0 1440 6.4238
22.6216 181.0 1448 6.4147
22.5603 182.0 1456 6.4370
22.3906 183.0 1464 6.4491
22.4381 184.0 1472 6.4585
22.1994 185.0 1480 6.4711
22.0592 186.0 1488 6.4803
21.9095 187.0 1496 6.4879
21.9612 188.0 1504 6.4955
21.9603 189.0 1512 6.5081
21.846 190.0 1520 6.5085
21.6954 191.0 1528 6.5342
21.6045 192.0 1536 6.5455
21.5128 193.0 1544 6.5483
21.4015 194.0 1552 6.5560
21.4992 195.0 1560 6.5607
21.296 196.0 1568 6.5677
21.2518 197.0 1576 6.5858
21.223 198.0 1584 6.5859
21.1109 199.0 1592 6.5945
21.0745 200.0 1600 6.6123
20.9234 201.0 1608 6.6232
20.8848 202.0 1616 6.6257
20.6494 203.0 1624 6.6360
20.5728 204.0 1632 6.6397
20.6611 205.0 1640 6.6523
20.6581 206.0 1648 6.6601
20.5148 207.0 1656 6.6650
20.3811 208.0 1664 6.6797
20.3773 209.0 1672 6.6829
20.3413 210.0 1680 6.6976
20.2472 211.0 1688 6.7077
20.0545 212.0 1696 6.7108
20.1101 213.0 1704 6.7161
19.9425 214.0 1712 6.7213
19.9614 215.0 1720 6.7333
19.8209 216.0 1728 6.7464
19.8237 217.0 1736 6.7525
19.6726 218.0 1744 6.7475
19.7297 219.0 1752 6.7594
19.6377 220.0 1760 6.7700
19.6132 221.0 1768 6.7751
19.5049 222.0 1776 6.7832
19.4827 223.0 1784 6.7866
19.3998 224.0 1792 6.7915
19.3534 225.0 1800 6.8059
19.2848 226.0 1808 6.8061
19.3685 227.0 1816 6.8101
19.2081 228.0 1824 6.8217
19.1761 229.0 1832 6.8128
19.1179 230.0 1840 6.8253
19.0693 231.0 1848 6.8385
18.9306 232.0 1856 6.8459
18.9219 233.0 1864 6.8500
18.8905 234.0 1872 6.8570
18.8549 235.0 1880 6.8631
18.7845 236.0 1888 6.8661
18.7904 237.0 1896 6.8749
18.7142 238.0 1904 6.8875
18.6035 239.0 1912 6.8926
18.5459 240.0 1920 6.8939
18.5899 241.0 1928 6.8945
18.5584 242.0 1936 6.9038
18.4848 243.0 1944 6.9132
18.5062 244.0 1952 6.9161
18.3082 245.0 1960 6.9171
18.3617 246.0 1968 6.9295
18.3946 247.0 1976 6.9279
18.2304 248.0 1984 6.9355
18.2184 249.0 1992 6.9421
18.213 250.0 2000 6.9424
18.1752 251.0 2008 6.9480
18.06 252.0 2016 6.9539
18.0693 253.0 2024 6.9560
18.0189 254.0 2032 6.9561
17.9015 255.0 2040 6.9649
17.9693 256.0 2048 6.9699
17.9178 257.0 2056 6.9857
17.8822 258.0 2064 6.9825
17.8456 259.0 2072 6.9852
17.8385 260.0 2080 6.9851
17.7816 261.0 2088 6.9962
17.7009 262.0 2096 6.9984
17.7425 263.0 2104 7.0047
17.6348 264.0 2112 7.0037
17.6382 265.0 2120 7.0135
17.7061 266.0 2128 7.0123
17.661 267.0 2136 7.0149
17.5448 268.0 2144 7.0211
17.4749 269.0 2152 7.0287
17.5358 270.0 2160 7.0287
17.4606 271.0 2168 7.0346
17.4813 272.0 2176 7.0378
17.403 273.0 2184 7.0462
17.4206 274.0 2192 7.0419
17.4906 275.0 2200 7.0413
17.3353 276.0 2208 7.0498
17.3957 277.0 2216 7.0507
17.3451 278.0 2224 7.0582
17.3083 279.0 2232 7.0585
17.2388 280.0 2240 7.0610
17.2831 281.0 2248 7.0702
17.1745 282.0 2256 7.0705
17.1825 283.0 2264 7.0736
17.1351 284.0 2272 7.0730
17.1355 285.0 2280 7.0778
17.1596 286.0 2288 7.0801
17.0965 287.0 2296 7.0782
17.0982 288.0 2304 7.0877
17.0794 289.0 2312 7.0873
16.9511 290.0 2320 7.1009
17.0132 291.0 2328 7.0933
16.9379 292.0 2336 7.0972
16.9018 293.0 2344 7.1025
16.9297 294.0 2352 7.1038
16.9443 295.0 2360 7.1024
16.9367 296.0 2368 7.1066
16.8805 297.0 2376 7.1074
16.8863 298.0 2384 7.1133
16.8961 299.0 2392 7.1092
16.8387 300.0 2400 7.1125
16.8368 301.0 2408 7.1157
16.8282 302.0 2416 7.1161
16.8568 303.0 2424 7.1210
16.8066 304.0 2432 7.1196
16.6857 305.0 2440 7.1241
16.7231 306.0 2448 7.1229
16.7 307.0 2456 7.1248
16.7097 308.0 2464 7.1302
16.6619 309.0 2472 7.1302
16.7357 310.0 2480 7.1317
16.6416 311.0 2488 7.1391
16.6208 312.0 2496 7.1367
16.6047 313.0 2504 7.1378
16.5973 314.0 2512 7.1393
16.571 315.0 2520 7.1402
16.5836 316.0 2528 7.1418
16.5634 317.0 2536 7.1435
16.5548 318.0 2544 7.1488
16.563 319.0 2552 7.1510
16.5766 320.0 2560 7.1483
16.4478 321.0 2568 7.1509
16.5622 322.0 2576 7.1535
16.4586 323.0 2584 7.1548
16.4832 324.0 2592 7.1542
16.4289 325.0 2600 7.1570
16.5299 326.0 2608 7.1548
16.4647 327.0 2616 7.1581
16.4929 328.0 2624 7.1577
16.4312 329.0 2632 7.1594
16.5021 330.0 2640 7.1604
16.4607 331.0 2648 7.1632
16.4328 332.0 2656 7.1623
16.3884 333.0 2664 7.1656
16.4128 334.0 2672 7.1655
16.4234 335.0 2680 7.1646
16.4392 336.0 2688 7.1665
16.3881 337.0 2696 7.1660
16.3477 338.0 2704 7.1682
16.4096 339.0 2712 7.1681
16.3908 340.0 2720 7.1702
16.3873 341.0 2728 7.1686
16.4087 342.0 2736 7.1711
16.3875 343.0 2744 7.1713
16.3314 344.0 2752 7.1716
16.3994 345.0 2760 7.1733
16.3845 346.0 2768 7.1713
16.3095 347.0 2776 7.1721
16.3001 348.0 2784 7.1725
16.3388 349.0 2792 7.1743
16.3279 350.0 2800 7.1716
16.3188 351.0 2808 7.1727
16.3254 352.0 2816 7.1741
16.4517 353.0 2824 7.1747
16.322 354.0 2832 7.1745
16.3631 355.0 2840 7.1748
16.3896 356.0 2848 7.1745
16.329 357.0 2856 7.1751
16.3249 358.0 2864 7.1754
16.3464 359.0 2872 7.1754
16.3886 360.0 2880 7.1760
16.3359 361.0 2888 7.1758
16.2931 362.0 2896 7.1759
16.3569 363.0 2904 7.1761
16.3704 364.0 2912 7.1762
16.3221 365.0 2920 7.1767
16.3058 366.0 2928 7.1768
16.2517 367.0 2936 7.1766
16.3604 368.0 2944 7.1764
16.3752 369.0 2952 7.1764
16.3373 370.0 2960 7.1766
16.3252 371.0 2968 7.1766
16.274 372.0 2976 7.1767
16.3587 373.0 2984 7.1767
16.3647 374.0 2992 7.1766
16.3286 375.0 3000 7.1767

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-dutch-random-trigram