impossible-llms-dutch-fronting-bigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.4259

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
83.6673 0.9180 7 10.1956
75.7206 1.9180 14 9.3458
72.3098 2.9180 21 8.9449
70.2821 3.9180 28 8.7390
69.0478 4.9180 35 8.5625
68.338 5.9180 42 8.4006
66.1562 6.9180 49 8.2220
64.9807 7.9180 56 8.0375
63.2723 8.9180 63 7.8572
61.0906 9.9180 70 7.6713
60.2245 10.9180 77 7.4791
59.196 11.9180 84 7.2779
56.8529 12.9180 91 7.0854
55.7183 13.9180 98 6.8852
54.0638 14.9180 105 6.6934
52.6141 15.9180 112 6.5213
51.2209 16.9180 119 6.3724
49.964 17.9180 126 6.2568
49.4053 18.9180 133 6.1501
48.3958 19.9180 140 6.0563
48.2846 20.9180 147 6.0040
47.9167 21.9180 154 5.9494
47.3996 22.9180 161 5.9178
46.9891 23.9180 168 5.8925
46.7405 24.9180 175 5.8723
46.6686 25.9180 182 5.8477
46.6996 26.9180 189 5.8223
46.5263 27.9180 196 5.8009
46.1017 28.9180 203 5.7802
46.1647 29.9180 210 5.7587
46.1546 30.9180 217 5.7441
45.7013 31.9180 224 5.7213
45.0503 32.9180 231 5.7076
45.2251 33.9180 238 5.6904
44.8943 34.9180 245 5.6772
44.9269 35.9180 252 5.6669
44.902 36.9180 259 5.6532
44.2823 37.9180 266 5.6402
44.6306 38.9180 273 5.6322
44.1941 39.9180 280 5.6278
44.4267 40.9180 287 5.6130
44.0962 41.9180 294 5.6022
43.9981 42.9180 301 5.5818
44.065 43.9180 308 5.5700
44.0451 44.9180 315 5.5602
43.7268 45.9180 322 5.5563
43.3324 46.9180 329 5.5343
43.1901 47.9180 336 5.5254
43.2259 48.9180 343 5.5198
43.2377 49.9180 350 5.4982
43.2266 50.9180 357 5.4944
42.7451 51.9180 364 5.4823
42.2536 52.9180 371 5.4781
42.2201 53.9180 378 5.4613
41.6915 54.9180 385 5.4499
42.0139 55.9180 392 5.4467
41.6444 56.9180 399 5.4400
41.2884 57.9180 406 5.4345
41.7892 58.9180 413 5.4238
41.3308 59.9180 420 5.4166
41.3869 60.9180 427 5.4145
40.7788 61.9180 434 5.4142
40.7967 62.9180 441 5.3992
40.3866 63.9180 448 5.4041
40.5721 64.9180 455 5.3941
40.5428 65.9180 462 5.3920
40.3906 66.9180 469 5.3858
40.151 67.9180 476 5.3839
39.6923 68.9180 483 5.3856
39.939 69.9180 490 5.3755
39.6868 70.9180 497 5.3749
39.6195 71.9180 504 5.3781
39.4678 72.9180 511 5.3854
39.1837 73.9180 518 5.3784
38.9759 74.9180 525 5.3770
38.5049 75.9180 532 5.3767
39.226 76.9180 539 5.3781
39.0352 77.9180 546 5.3835
38.1275 78.9180 553 5.3864
38.3793 79.9180 560 5.3914
38.0253 80.9180 567 5.3863
37.9907 81.9180 574 5.3995
38.0236 82.9180 581 5.4028
37.6736 83.9180 588 5.4061
38.1138 84.9180 595 5.4162
37.4033 85.9180 602 5.4258
37.1398 86.9180 609 5.4253
36.6317 87.9180 616 5.4345
36.4137 88.9180 623 5.4452
36.9893 89.9180 630 5.4385
36.8399 90.9180 637 5.4598
36.3405 91.9180 644 5.4627
36.4991 92.9180 651 5.4632
36.3748 93.9180 658 5.4811
35.9608 94.9180 665 5.4745
35.9226 95.9180 672 5.4863
35.9751 96.9180 679 5.5081
35.5547 97.9180 686 5.4970
35.2164 98.9180 693 5.5214
34.9061 99.9180 700 5.5231
34.9274 100.9180 707 5.5416
34.9502 101.9180 714 5.5474
34.9347 102.9180 721 5.5456
34.5548 103.9180 728 5.5700
34.8556 104.9180 735 5.5781
34.6057 105.9180 742 5.5848
34.4437 106.9180 749 5.5971
34.1718 107.9180 756 5.6073
34.0907 108.9180 763 5.6244
33.6848 109.9180 770 5.6291
33.5163 110.9180 777 5.6382
33.4478 111.9180 784 5.6576
33.3843 112.9180 791 5.6750
32.958 113.9180 798 5.6639
33.0119 114.9180 805 5.6917
32.8102 115.9180 812 5.6936
32.6526 116.9180 819 5.7121
32.4103 117.9180 826 5.7240
32.1539 118.9180 833 5.7442
32.4264 119.9180 840 5.7544
32.255 120.9180 847 5.7611
32.1105 121.9180 854 5.7661
31.9384 122.9180 861 5.7819
31.7077 123.9180 868 5.7929
31.4754 124.9180 875 5.8061
31.2536 125.9180 882 5.8106
31.2701 126.9180 889 5.8371
30.912 127.9180 896 5.8546
30.834 128.9180 903 5.8566
30.8271 129.9180 910 5.8626
30.6406 130.9180 917 5.8845
30.1948 131.9180 924 5.9023
30.5228 132.9180 931 5.9115
30.3403 133.9180 938 5.9134
29.9515 134.9180 945 5.9286
29.9577 135.9180 952 5.9542
29.677 136.9180 959 5.9454
29.6693 137.9180 966 5.9675
29.4862 138.9180 973 5.9812
29.1925 139.9180 980 5.9978
29.3036 140.9180 987 6.0049
29.0343 141.9180 994 6.0295
28.8367 142.9180 1001 6.0296
28.5935 143.9180 1008 6.0414
28.5507 144.9180 1015 6.0576
28.5536 145.9180 1022 6.0778
28.5476 146.9180 1029 6.0790
28.4762 147.9180 1036 6.0833
28.3717 148.9180 1043 6.1077
28.0584 149.9180 1050 6.1150
27.8302 150.9180 1057 6.1315
27.6707 151.9180 1064 6.1347
27.4556 152.9180 1071 6.1526
27.6272 153.9180 1078 6.1656
27.1733 154.9180 1085 6.1874
27.266 155.9180 1092 6.1981
27.0323 156.9180 1099 6.2104
26.9986 157.9180 1106 6.2195
26.873 158.9180 1113 6.2223
26.6617 159.9180 1120 6.2467
26.4292 160.9180 1127 6.2545
26.4855 161.9180 1134 6.2671
26.215 162.9180 1141 6.2774
26.1489 163.9180 1148 6.2915
26.3328 164.9180 1155 6.2961
25.8904 165.9180 1162 6.3048
25.8863 166.9180 1169 6.3237
25.8484 167.9180 1176 6.3325
25.6263 168.9180 1183 6.3506
25.4471 169.9180 1190 6.3469
25.4162 170.9180 1197 6.3608
25.3895 171.9180 1204 6.3787
25.0886 172.9180 1211 6.3826
25.051 173.9180 1218 6.3959
24.9231 174.9180 1225 6.4132
24.6853 175.9180 1232 6.4270
24.8123 176.9180 1239 6.4290
24.5636 177.9180 1246 6.4452
24.5249 178.9180 1253 6.4572
24.25 179.9180 1260 6.4652
24.2591 180.9180 1267 6.4670
24.0907 181.9180 1274 6.4866
24.013 182.9180 1281 6.4951
23.8728 183.9180 1288 6.5100
23.9411 184.9180 1295 6.5066
23.7083 185.9180 1302 6.5290
23.7176 186.9180 1309 6.5326
23.5182 187.9180 1316 6.5541
23.3504 188.9180 1323 6.5660
23.4175 189.9180 1330 6.5668
23.217 190.9180 1337 6.5860
23.1964 191.9180 1344 6.5914
22.909 192.9180 1351 6.5891
22.9406 193.9180 1358 6.6052
22.9394 194.9180 1365 6.6192
22.8075 195.9180 1372 6.6292
22.6948 196.9180 1379 6.6334
22.6171 197.9180 1386 6.6415
22.5568 198.9180 1393 6.6495
22.5281 199.9180 1400 6.6619
22.3907 200.9180 1407 6.6728
22.2757 201.9180 1414 6.6944
22.3475 202.9180 1421 6.6898
22.1154 203.9180 1428 6.6898
22.0423 204.9180 1435 6.7088
21.9327 205.9180 1442 6.7211
21.7245 206.9180 1449 6.7252
21.8766 207.9180 1456 6.7375
21.6527 208.9180 1463 6.7348
21.7386 209.9180 1470 6.7472
21.6643 210.9180 1477 6.7558
21.3764 211.9180 1484 6.7740
21.3917 212.9180 1491 6.7777
21.4281 213.9180 1498 6.7824
21.1757 214.9180 1505 6.7908
21.2494 215.9180 1512 6.7976
21.1086 216.9180 1519 6.8047
21.0251 217.9180 1526 6.8199
20.8031 218.9180 1533 6.8263
20.9154 219.9180 1540 6.8258
20.8454 220.9180 1547 6.8398
20.6033 221.9180 1554 6.8448
20.5957 222.9180 1561 6.8610
20.6872 223.9180 1568 6.8638
20.3843 224.9180 1575 6.8703
20.4692 225.9180 1582 6.8917
20.4153 226.9180 1589 6.8809
20.2434 227.9180 1596 6.8948
20.3918 228.9180 1603 6.9083
20.1626 229.9180 1610 6.9046
20.261 230.9180 1617 6.9204
20.0765 231.9180 1624 6.9252
19.9718 232.9180 1631 6.9314
19.9375 233.9180 1638 6.9385
19.9831 234.9180 1645 6.9447
19.9072 235.9180 1652 6.9436
19.7478 236.9180 1659 6.9597
19.6264 237.9180 1666 6.9608
19.5997 238.9180 1673 6.9732
19.5747 239.9180 1680 6.9861
19.4862 240.9180 1687 6.9806
19.4703 241.9180 1694 6.9854
19.4437 242.9180 1701 6.9917
19.3692 243.9180 1708 7.0022
19.2464 244.9180 1715 7.0061
19.2336 245.9180 1722 7.0171
19.1552 246.9180 1729 7.0197
19.1323 247.9180 1736 7.0175
19.077 248.9180 1743 7.0307
19.0595 249.9180 1750 7.0348
18.9852 250.9180 1757 7.0430
18.9572 251.9180 1764 7.0453
18.993 252.9180 1771 7.0570
18.8989 253.9180 1778 7.0581
18.691 254.9180 1785 7.0699
18.8285 255.9180 1792 7.0669
18.6967 256.9180 1799 7.0757
18.5931 257.9180 1806 7.0807
18.6768 258.9180 1813 7.0865
18.5537 259.9180 1820 7.0934
18.539 260.9180 1827 7.0953
18.3686 261.9180 1834 7.1071
18.4691 262.9180 1841 7.1061
18.2915 263.9180 1848 7.1186
18.1952 264.9180 1855 7.1096
18.121 265.9180 1862 7.1228
18.3078 266.9180 1869 7.1228
18.2565 267.9180 1876 7.1326
18.2128 268.9180 1883 7.1317
18.0796 269.9180 1890 7.1313
18.0938 270.9180 1897 7.1441
18.1037 271.9180 1904 7.1533
18.0365 272.9180 1911 7.1559
17.8114 273.9180 1918 7.1561
17.8991 274.9180 1925 7.1635
17.8748 275.9180 1932 7.1700
17.7981 276.9180 1939 7.1745
17.7959 277.9180 1946 7.1880
17.7333 278.9180 1953 7.1809
17.7708 279.9180 1960 7.1966
17.724 280.9180 1967 7.1890
17.5739 281.9180 1974 7.2033
17.6699 282.9180 1981 7.2033
17.6359 283.9180 1988 7.2064
17.5734 284.9180 1995 7.2038
17.6082 285.9180 2002 7.2078
17.4631 286.9180 2009 7.2178
17.4105 287.9180 2016 7.2228
17.4747 288.9180 2023 7.2299
17.4737 289.9180 2030 7.2346
17.3741 290.9180 2037 7.2325
17.3798 291.9180 2044 7.2365
17.374 292.9180 2051 7.2448
17.2849 293.9180 2058 7.2478
17.3266 294.9180 2065 7.2528
17.2119 295.9180 2072 7.2465
17.2125 296.9180 2079 7.2552
17.1324 297.9180 2086 7.2549
17.0641 298.9180 2093 7.2625
17.095 299.9180 2100 7.2642
17.1231 300.9180 2107 7.2685
16.9683 301.9180 2114 7.2621
17.0125 302.9180 2121 7.2700
17.0242 303.9180 2128 7.2754
16.902 304.9180 2135 7.2836
17.0353 305.9180 2142 7.2812
16.9487 306.9180 2149 7.2794
16.8356 307.9180 2156 7.2958
16.7948 308.9180 2163 7.2922
16.8632 309.9180 2170 7.2923
16.7753 310.9180 2177 7.2994
16.7484 311.9180 2184 7.3001
16.7657 312.9180 2191 7.3089
16.7394 313.9180 2198 7.3070
16.7323 314.9180 2205 7.3080
16.6862 315.9180 2212 7.3139
16.6908 316.9180 2219 7.3175
16.6181 317.9180 2226 7.3198
16.713 318.9180 2233 7.3233
16.5816 319.9180 2240 7.3198
16.5054 320.9180 2247 7.3268
16.6918 321.9180 2254 7.3292
16.6408 322.9180 2261 7.3231
16.5806 323.9180 2268 7.3302
16.5555 324.9180 2275 7.3388
16.5342 325.9180 2282 7.3398
16.4235 326.9180 2289 7.3405
16.5023 327.9180 2296 7.3402
16.4626 328.9180 2303 7.3428
16.3941 329.9180 2310 7.3406
16.4279 330.9180 2317 7.3508
16.4444 331.9180 2324 7.3459
16.3836 332.9180 2331 7.3510
16.4184 333.9180 2338 7.3566
16.3631 334.9180 2345 7.3561
16.3384 335.9180 2352 7.3603
16.3244 336.9180 2359 7.3589
16.2741 337.9180 2366 7.3628
16.304 338.9180 2373 7.3642
16.2746 339.9180 2380 7.3642
16.3027 340.9180 2387 7.3663
16.242 341.9180 2394 7.3688
16.2367 342.9180 2401 7.3676
16.1752 343.9180 2408 7.3685
16.1994 344.9180 2415 7.3741
16.108 345.9180 2422 7.3765
16.1362 346.9180 2429 7.3781
16.2155 347.9180 2436 7.3800
16.2048 348.9180 2443 7.3775
16.1283 349.9180 2450 7.3837
16.1557 350.9180 2457 7.3850
16.1228 351.9180 2464 7.3871
16.0919 352.9180 2471 7.3901
16.0663 353.9180 2478 7.3876
16.0835 354.9180 2485 7.3865
16.0743 355.9180 2492 7.3877
16.1027 356.9180 2499 7.3859
16.0455 357.9180 2506 7.3990
16.0158 358.9180 2513 7.3962
16.1156 359.9180 2520 7.3955
16.0399 360.9180 2527 7.3975
15.9918 361.9180 2534 7.4001
15.9691 362.9180 2541 7.4001
15.9589 363.9180 2548 7.4017
15.9023 364.9180 2555 7.4012
15.9877 365.9180 2562 7.4013
15.9899 366.9180 2569 7.4031
15.859 367.9180 2576 7.4029
16.0035 368.9180 2583 7.4008
15.9595 369.9180 2590 7.4076
15.9003 370.9180 2597 7.4072
15.9346 371.9180 2604 7.4094
15.9182 372.9180 2611 7.4060
15.8456 373.9180 2618 7.4104
15.9017 374.9180 2625 7.4097
15.9036 375.9180 2632 7.4126
15.8435 376.9180 2639 7.4117
15.8846 377.9180 2646 7.4136
15.838 378.9180 2653 7.4110
15.8668 379.9180 2660 7.4139
15.8037 380.9180 2667 7.4129
15.8425 381.9180 2674 7.4157
15.8884 382.9180 2681 7.4145
15.7861 383.9180 2688 7.4151
15.8327 384.9180 2695 7.4176
15.8519 385.9180 2702 7.4165
15.7866 386.9180 2709 7.4200
15.8224 387.9180 2716 7.4191
15.8119 388.9180 2723 7.4184
15.7032 389.9180 2730 7.4195
15.8936 390.9180 2737 7.4206
15.8436 391.9180 2744 7.4201
15.7939 392.9180 2751 7.4215
15.8355 393.9180 2758 7.4204
15.76 394.9180 2765 7.4222
15.822 395.9180 2772 7.4208
15.8252 396.9180 2779 7.4226
15.73 397.9180 2786 7.4216
15.8799 398.9180 2793 7.4218
15.7825 399.9180 2800 7.4225
15.7679 400.9180 2807 7.4242
15.7536 401.9180 2814 7.4230
15.7815 402.9180 2821 7.4241
15.7929 403.9180 2828 7.4240
15.764 404.9180 2835 7.4241
15.7348 405.9180 2842 7.4246
15.7734 406.9180 2849 7.4254
15.7698 407.9180 2856 7.4256
15.783 408.9180 2863 7.4251
15.7823 409.9180 2870 7.4255
15.7662 410.9180 2877 7.4266
15.7578 411.9180 2884 7.4266
15.7922 412.9180 2891 7.4260
15.8236 413.9180 2898 7.4259
15.7965 414.9180 2905 7.4259
15.761 415.9180 2912 7.4260
15.7627 416.9180 2919 7.4259
15.8121 417.9180 2926 7.4259
15.823 418.9180 2933 7.4259
15.7539 419.9180 2940 7.4260
15.6949 420.9180 2947 7.4260
15.772 421.9180 2954 7.4260
15.7556 422.9180 2961 7.4260
15.8064 423.9180 2968 7.4260
15.7151 424.9180 2975 7.4260
15.7625 425.9180 2982 7.4260
15.7778 426.9180 2989 7.4259
15.7579 427.9180 2996 7.4259
15.756 428.5246 3000 7.4259

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-dutch-fronting-bigram