impossible-llms-dutch-random

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.8703

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
82.5297 1.0 8 10.0770
75.2713 2.0 16 9.3215
72.1748 3.0 24 8.9474
70.3836 4.0 32 8.7533
68.7233 5.0 40 8.5694
68.1397 6.0 48 8.3998
65.9139 7.0 56 8.2237
64.9912 8.0 64 8.0185
63.1884 9.0 72 7.8050
61.0758 10.0 80 7.5816
58.7919 11.0 88 7.3522
57.6353 12.0 96 7.1355
55.7919 13.0 104 6.9140
54.3793 14.0 112 6.7084
52.4426 15.0 120 6.5207
50.9518 16.0 128 6.3680
49.9165 17.0 136 6.2592
49.6506 18.0 144 6.1793
49.1419 19.0 152 6.1347
48.6417 20.0 160 6.0991
48.4127 21.0 168 6.0739
48.1562 22.0 176 6.0576
48.2511 23.0 184 6.0269
47.6628 24.0 192 6.0083
47.6583 25.0 200 5.9912
47.6092 26.0 208 5.9664
47.0146 27.0 216 5.9539
46.7583 28.0 224 5.9335
46.6195 29.0 232 5.9224
46.8712 30.0 240 5.9060
47.1065 31.0 248 5.8899
46.4521 32.0 256 5.8798
46.4327 33.0 264 5.8634
46.3667 34.0 272 5.8600
46.4804 35.0 280 5.8408
45.9928 36.0 288 5.8330
45.3194 37.0 296 5.8226
45.4577 38.0 304 5.8084
45.894 39.0 312 5.7968
45.5188 40.0 320 5.7803
45.089 41.0 328 5.7683
45.1928 42.0 336 5.7539
44.9258 43.0 344 5.7492
44.4785 44.0 352 5.7356
44.3712 45.0 360 5.7214
43.9926 46.0 368 5.7105
44.0521 47.0 376 5.7025
44.2681 48.0 384 5.6864
43.8842 49.0 392 5.6915
44.0292 50.0 400 5.6721
43.7999 51.0 408 5.6665
43.0278 52.0 416 5.6557
43.1901 53.0 424 5.6518
42.8269 54.0 432 5.6407
42.9919 55.0 440 5.6367
43.0373 56.0 448 5.6384
42.7188 57.0 456 5.6362
42.5106 58.0 464 5.6234
42.3533 59.0 472 5.6280
42.2024 60.0 480 5.6280
41.6783 61.0 488 5.6185
41.9271 62.0 496 5.6186
42.1699 63.0 504 5.6138
41.4923 64.0 512 5.6225
41.122 65.0 520 5.6251
41.4343 66.0 528 5.6240
40.9466 67.0 536 5.6304
40.9043 68.0 544 5.6287
40.9882 69.0 552 5.6278
40.5807 70.0 560 5.6396
40.5174 71.0 568 5.6373
40.4843 72.0 576 5.6529
40.3875 73.0 584 5.6563
39.5228 74.0 592 5.6588
39.7395 75.0 600 5.6627
39.4503 76.0 608 5.6814
39.2378 77.0 616 5.6879
39.0755 78.0 624 5.6979
38.5196 79.0 632 5.7068
39.035 80.0 640 5.7060
38.6452 81.0 648 5.7167
38.6207 82.0 656 5.7241
38.1154 83.0 664 5.7488
38.2519 84.0 672 5.7631
38.2858 85.0 680 5.7695
37.5524 86.0 688 5.7807
37.6586 87.0 696 5.7915
37.2535 88.0 704 5.7989
37.321 89.0 712 5.8139
37.6865 90.0 720 5.8240
36.9457 91.0 728 5.8359
36.7254 92.0 736 5.8496
36.7085 93.0 744 5.8748
36.0782 94.0 752 5.8857
36.519 95.0 760 5.8849
36.0224 96.0 768 5.9069
35.9905 97.0 776 5.9277
35.7604 98.0 784 5.9370
35.4492 99.0 792 5.9565
35.3659 100.0 800 5.9697
34.8069 101.0 808 5.9807
34.7119 102.0 816 6.0135
34.8099 103.0 824 6.0244
34.6587 104.0 832 6.0381
34.42 105.0 840 6.0448
34.2998 106.0 848 6.0663
34.5008 107.0 856 6.0797
33.9325 108.0 864 6.1035
33.4774 109.0 872 6.1207
33.7893 110.0 880 6.1308
33.2222 111.0 888 6.1456
33.1013 112.0 896 6.1646
32.8745 113.0 904 6.1641
32.7694 114.0 912 6.1864
32.7389 115.0 920 6.1929
32.141 116.0 928 6.2157
32.3931 117.0 936 6.2433
32.1716 118.0 944 6.2583
31.7071 119.0 952 6.2781
31.7303 120.0 960 6.2829
31.6201 121.0 968 6.2976
31.6703 122.0 976 6.3261
31.4431 123.0 984 6.3187
31.1313 124.0 992 6.3464
31.1125 125.0 1000 6.3596
30.5546 126.0 1008 6.3701
30.5229 127.0 1016 6.3936
30.5583 128.0 1024 6.4039
29.9087 129.0 1032 6.4209
30.1083 130.0 1040 6.4406
30.1639 131.0 1048 6.4563
30.0241 132.0 1056 6.4515
29.7109 133.0 1064 6.4762
29.6976 134.0 1072 6.5036
29.1271 135.0 1080 6.5081
29.1462 136.0 1088 6.5219
29.1602 137.0 1096 6.5431
28.9582 138.0 1104 6.5525
28.9125 139.0 1112 6.5714
28.806 140.0 1120 6.5838
28.3943 141.0 1128 6.5918
28.4897 142.0 1136 6.6051
28.3459 143.0 1144 6.6368
28.1586 144.0 1152 6.6319
27.8438 145.0 1160 6.6384
27.7995 146.0 1168 6.6728
27.642 147.0 1176 6.6887
27.4882 148.0 1184 6.6989
27.3302 149.0 1192 6.7170
27.3918 150.0 1200 6.7393
27.0773 151.0 1208 6.7356
27.0217 152.0 1216 6.7600
26.8145 153.0 1224 6.7603
26.7394 154.0 1232 6.7931
26.4808 155.0 1240 6.8029
26.4208 156.0 1248 6.8085
26.3699 157.0 1256 6.8130
26.0391 158.0 1264 6.8273
26.0982 159.0 1272 6.8488
25.9146 160.0 1280 6.8587
25.615 161.0 1288 6.8739
25.7097 162.0 1296 6.8933
25.6718 163.0 1304 6.8981
25.4624 164.0 1312 6.9169
25.5031 165.0 1320 6.9219
25.1747 166.0 1328 6.9331
25.2821 167.0 1336 6.9415
24.952 168.0 1344 6.9583
24.9371 169.0 1352 6.9517
24.8954 170.0 1360 6.9748
24.6579 171.0 1368 6.9792
24.6528 172.0 1376 7.0091
24.5318 173.0 1384 7.0151
24.5616 174.0 1392 7.0329
24.3891 175.0 1400 7.0341
24.2457 176.0 1408 7.0527
24.0944 177.0 1416 7.0548
24.0057 178.0 1424 7.0630
23.9593 179.0 1432 7.0800
23.6644 180.0 1440 7.0821
23.5629 181.0 1448 7.0964
23.6401 182.0 1456 7.1129
23.4732 183.0 1464 7.1162
23.4746 184.0 1472 7.1237
23.4217 185.0 1480 7.1419
23.1151 186.0 1488 7.1519
23.1082 187.0 1496 7.1609
22.8879 188.0 1504 7.1679
22.9213 189.0 1512 7.1831
22.7957 190.0 1520 7.2021
22.752 191.0 1528 7.2111
22.5843 192.0 1536 7.2145
22.5194 193.0 1544 7.2202
22.3924 194.0 1552 7.2330
22.4612 195.0 1560 7.2520
22.2972 196.0 1568 7.2524
22.344 197.0 1576 7.2646
22.2342 198.0 1584 7.2734
22.0022 199.0 1592 7.2832
21.9061 200.0 1600 7.2880
21.9604 201.0 1608 7.3081
21.881 202.0 1616 7.3065
21.8628 203.0 1624 7.3140
21.6104 204.0 1632 7.3281
21.5813 205.0 1640 7.3365
21.4344 206.0 1648 7.3373
21.3925 207.0 1656 7.3413
21.4712 208.0 1664 7.3493
21.4242 209.0 1672 7.3673
21.2882 210.0 1680 7.3698
21.1642 211.0 1688 7.3790
21.3028 212.0 1696 7.3870
21.0309 213.0 1704 7.3983
20.9613 214.0 1712 7.4027
20.8984 215.0 1720 7.4098
20.7997 216.0 1728 7.4210
20.807 217.0 1736 7.4261
20.6767 218.0 1744 7.4297
20.7705 219.0 1752 7.4345
20.5957 220.0 1760 7.4424
20.5179 221.0 1768 7.4460
20.4847 222.0 1776 7.4661
20.4177 223.0 1784 7.4693
20.4464 224.0 1792 7.4808
20.3081 225.0 1800 7.4806
20.2896 226.0 1808 7.4959
20.2046 227.0 1816 7.4982
20.1841 228.0 1824 7.5004
20.0097 229.0 1832 7.5108
19.9862 230.0 1840 7.5133
20.0236 231.0 1848 7.5209
19.9825 232.0 1856 7.5214
19.7969 233.0 1864 7.5406
19.7643 234.0 1872 7.5409
19.7543 235.0 1880 7.5429
19.6646 236.0 1888 7.5544
19.5819 237.0 1896 7.5582
19.6317 238.0 1904 7.5659
19.5744 239.0 1912 7.5676
19.5507 240.0 1920 7.5697
19.4746 241.0 1928 7.5820
19.4941 242.0 1936 7.5906
19.3649 243.0 1944 7.5923
19.3889 244.0 1952 7.6013
19.231 245.0 1960 7.6031
19.2543 246.0 1968 7.6026
19.2875 247.0 1976 7.6220
19.1089 248.0 1984 7.6273
19.1875 249.0 1992 7.6250
19.0476 250.0 2000 7.6326
19.0897 251.0 2008 7.6312
18.9209 252.0 2016 7.6415
18.9494 253.0 2024 7.6475
18.9372 254.0 2032 7.6421
18.8355 255.0 2040 7.6577
18.7998 256.0 2048 7.6529
18.7709 257.0 2056 7.6612
18.748 258.0 2064 7.6691
18.7045 259.0 2072 7.6765
18.7101 260.0 2080 7.6702
18.6348 261.0 2088 7.6823
18.701 262.0 2096 7.6870
18.6114 263.0 2104 7.6880
18.5562 264.0 2112 7.6928
18.5876 265.0 2120 7.7002
18.4442 266.0 2128 7.7010
18.4767 267.0 2136 7.7050
18.4078 268.0 2144 7.7098
18.3392 269.0 2152 7.7208
18.3327 270.0 2160 7.7156
18.237 271.0 2168 7.7137
18.311 272.0 2176 7.7235
18.1857 273.0 2184 7.7313
18.1888 274.0 2192 7.7284
18.1776 275.0 2200 7.7395
18.1628 276.0 2208 7.7388
18.1172 277.0 2216 7.7414
18.091 278.0 2224 7.7443
18.0491 279.0 2232 7.7472
18.0176 280.0 2240 7.7535
18.031 281.0 2248 7.7559
17.914 282.0 2256 7.7597
17.9612 283.0 2264 7.7597
17.8927 284.0 2272 7.7687
17.9148 285.0 2280 7.7630
17.8205 286.0 2288 7.7739
17.8424 287.0 2296 7.7761
17.8858 288.0 2304 7.7774
17.7969 289.0 2312 7.7812
17.8299 290.0 2320 7.7850
17.7934 291.0 2328 7.7836
17.7146 292.0 2336 7.7872
17.7186 293.0 2344 7.7902
17.6425 294.0 2352 7.7921
17.7296 295.0 2360 7.7991
17.6579 296.0 2368 7.8004
17.6292 297.0 2376 7.7980
17.7181 298.0 2384 7.8062
17.5439 299.0 2392 7.8002
17.5962 300.0 2400 7.8089
17.6127 301.0 2408 7.8101
17.5354 302.0 2416 7.8132
17.5389 303.0 2424 7.8188
17.527 304.0 2432 7.8133
17.5128 305.0 2440 7.8140
17.4388 306.0 2448 7.8211
17.4295 307.0 2456 7.8224
17.4627 308.0 2464 7.8243
17.485 309.0 2472 7.8272
17.4027 310.0 2480 7.8292
17.4191 311.0 2488 7.8275
17.3765 312.0 2496 7.8291
17.4451 313.0 2504 7.8346
17.3163 314.0 2512 7.8383
17.3568 315.0 2520 7.8369
17.3954 316.0 2528 7.8382
17.3581 317.0 2536 7.8407
17.3554 318.0 2544 7.8381
17.324 319.0 2552 7.8429
17.3459 320.0 2560 7.8437
17.2876 321.0 2568 7.8437
17.3113 322.0 2576 7.8462
17.3675 323.0 2584 7.8481
17.2084 324.0 2592 7.8474
17.3408 325.0 2600 7.8543
17.2688 326.0 2608 7.8485
17.2164 327.0 2616 7.8505
17.3126 328.0 2624 7.8525
17.1989 329.0 2632 7.8548
17.242 330.0 2640 7.8554
17.1757 331.0 2648 7.8563
17.2276 332.0 2656 7.8569
17.1629 333.0 2664 7.8550
17.2025 334.0 2672 7.8603
17.1989 335.0 2680 7.8595
17.1648 336.0 2688 7.8624
17.1324 337.0 2696 7.8580
17.1679 338.0 2704 7.8619
17.1706 339.0 2712 7.8619
17.2218 340.0 2720 7.8616
17.2003 341.0 2728 7.8641
17.1316 342.0 2736 7.8644
17.1438 343.0 2744 7.8648
17.1045 344.0 2752 7.8657
17.1118 345.0 2760 7.8674
17.127 346.0 2768 7.8666
17.099 347.0 2776 7.8672
17.0868 348.0 2784 7.8677
17.1284 349.0 2792 7.8692
17.0982 350.0 2800 7.8677
17.1447 351.0 2808 7.8681
17.0814 352.0 2816 7.8676
17.0835 353.0 2824 7.8688
17.1603 354.0 2832 7.8689
17.0995 355.0 2840 7.8687
17.1009 356.0 2848 7.8692
17.0391 357.0 2856 7.8691
17.1428 358.0 2864 7.8695
17.0558 359.0 2872 7.8695
17.0703 360.0 2880 7.8691
17.0786 361.0 2888 7.8692
17.0794 362.0 2896 7.8692
17.0163 363.0 2904 7.8692
17.0753 364.0 2912 7.8696
17.0689 365.0 2920 7.8698
17.1054 366.0 2928 7.8699
17.112 367.0 2936 7.8698
17.0938 368.0 2944 7.8698
17.0279 369.0 2952 7.8699
17.017 370.0 2960 7.8701
17.0663 371.0 2968 7.8702
17.1667 372.0 2976 7.8703
17.0397 373.0 2984 7.8703
17.0594 374.0 2992 7.8703
17.1148 375.0 3000 7.8703

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-dutch-random