impossible-llms-german-fronting-bigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.0327

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
76.0331 0.9697 16 9.4338
71.6673 1.9697 32 8.9567
69.3824 2.9697 48 8.6316
66.0393 3.9697 64 8.2404
62.9205 4.9697 80 7.8178
59.3477 5.9697 96 7.3733
56.0467 6.9697 112 6.9478
53.3128 7.9697 128 6.6167
51.4884 8.9697 144 6.4220
50.5338 9.9697 160 6.3133
49.7489 10.9697 176 6.2331
49.5029 11.9697 192 6.1774
48.9784 12.9697 208 6.1174
48.484 13.9697 224 6.0770
48.1829 14.9697 240 6.0347
47.8824 15.9697 256 5.9986
47.6146 16.9697 272 5.9695
47.5391 17.9697 288 5.9389
46.8092 18.9697 304 5.9169
46.6895 19.9697 320 5.8892
46.41 20.9697 336 5.8594
46.259 21.9697 352 5.8347
45.916 22.9697 368 5.8068
45.7068 23.9697 384 5.7844
45.0942 24.9697 400 5.7543
45.2343 25.9697 416 5.7346
44.4914 26.9697 432 5.7001
44.4128 27.9697 448 5.6704
44.3277 28.9697 464 5.6436
44.281 29.9697 480 5.6213
43.7328 30.9697 496 5.5903
43.4948 31.9697 512 5.5636
42.9645 32.9697 528 5.5445
42.8097 33.9697 544 5.5143
42.9638 34.9697 560 5.4895
42.2659 35.9697 576 5.4703
42.2061 36.9697 592 5.4521
41.8982 37.9697 608 5.4312
41.7378 38.9697 624 5.4206
41.6193 39.9697 640 5.3960
41.4941 40.9697 656 5.3849
41.14 41.9697 672 5.3676
41.0429 42.9697 688 5.3546
40.7736 43.9697 704 5.3389
40.571 44.9697 720 5.3348
40.1309 45.9697 736 5.3256
40.3361 46.9697 752 5.3146
40.1762 47.9697 768 5.3130
39.6508 48.9697 784 5.2994
39.7313 49.9697 800 5.2926
39.1776 50.9697 816 5.2911
38.9937 51.9697 832 5.2884
39.0236 52.9697 848 5.2826
38.802 53.9697 864 5.2838
38.5939 54.9697 880 5.2746
38.3015 55.9697 896 5.2809
38.4052 56.9697 912 5.2786
38.1407 57.9697 928 5.2766
37.8546 58.9697 944 5.2801
37.8129 59.9697 960 5.2827
37.7701 60.9697 976 5.2802
37.2836 61.9697 992 5.2886
37.4281 62.9697 1008 5.2888
37.1109 63.9697 1024 5.2970
36.7481 64.9697 1040 5.3024
36.7859 65.9697 1056 5.3057
36.6616 66.9697 1072 5.3054
36.4677 67.9697 1088 5.3106
36.3254 68.9697 1104 5.3140
36.2414 69.9697 1120 5.3257
36.0447 70.9697 1136 5.3329
35.8527 71.9697 1152 5.3406
35.6302 72.9697 1168 5.3465
35.3203 73.9697 1184 5.3598
35.3191 74.9697 1200 5.3631
35.0504 75.9697 1216 5.3752
34.8329 76.9697 1232 5.3791
34.9235 77.9697 1248 5.3881
34.6245 78.9697 1264 5.3958
34.5753 79.9697 1280 5.4096
34.4142 80.9697 1296 5.4234
34.1888 81.9697 1312 5.4222
33.9786 82.9697 1328 5.4345
34.1911 83.9697 1344 5.4443
33.795 84.9697 1360 5.4489
33.8146 85.9697 1376 5.4639
33.5029 86.9697 1392 5.4749
33.2093 87.9697 1408 5.4868
33.4014 88.9697 1424 5.4939
33.2127 89.9697 1440 5.5026
33.0343 90.9697 1456 5.5170
33.004 91.9697 1472 5.5275
32.6994 92.9697 1488 5.5386
32.7135 93.9697 1504 5.5470
32.5986 94.9697 1520 5.5595
32.5105 95.9697 1536 5.5684
32.0216 96.9697 1552 5.5737
32.1599 97.9697 1568 5.5834
32.0658 98.9697 1584 5.5970
31.9547 99.9697 1600 5.6053
31.7688 100.9697 1616 5.6189
31.6648 101.9697 1632 5.6250
31.4364 102.9697 1648 5.6369
31.5234 103.9697 1664 5.6476
31.5321 104.9697 1680 5.6584
31.4901 105.9697 1696 5.6663
31.2033 106.9697 1712 5.6770
31.018 107.9697 1728 5.6838
30.947 108.9697 1744 5.6970
30.7787 109.9697 1760 5.6996
30.7756 110.9697 1776 5.7140
30.507 111.9697 1792 5.7222
30.6232 112.9697 1808 5.7301
30.5118 113.9697 1824 5.7433
30.1597 114.9697 1840 5.7478
30.3279 115.9697 1856 5.7577
30.0896 116.9697 1872 5.7650
30.1555 117.9697 1888 5.7767
29.8685 118.9697 1904 5.7873
29.8582 119.9697 1920 5.7923
29.9249 120.9697 1936 5.8030
29.8765 121.9697 1952 5.8111
29.7486 122.9697 1968 5.8230
29.6922 123.9697 1984 5.8276
29.4026 124.9697 2000 5.8362
29.3903 125.9697 2016 5.8376
29.4068 126.9697 2032 5.8466
29.4381 127.9697 2048 5.8547
29.1938 128.9697 2064 5.8639
29.221 129.9697 2080 5.8655
29.2049 130.9697 2096 5.8743
29.1575 131.9697 2112 5.8820
29.1866 132.9697 2128 5.8838
28.8952 133.9697 2144 5.8972
28.902 134.9697 2160 5.9019
28.8144 135.9697 2176 5.9041
28.8061 136.9697 2192 5.9117
28.7095 137.9697 2208 5.9188
28.6465 138.9697 2224 5.9240
28.5635 139.9697 2240 5.9280
28.5905 140.9697 2256 5.9319
28.5319 141.9697 2272 5.9402
28.3719 142.9697 2288 5.9453
28.3483 143.9697 2304 5.9481
28.2941 144.9697 2320 5.9543
28.3381 145.9697 2336 5.9589
28.2441 146.9697 2352 5.9611
28.2837 147.9697 2368 5.9652
28.3381 148.9697 2384 5.9722
28.0723 149.9697 2400 5.9747
28.2182 150.9697 2416 5.9805
28.1211 151.9697 2432 5.9822
28.1113 152.9697 2448 5.9845
28.0931 153.9697 2464 5.9882
28.0684 154.9697 2480 5.9908
27.96 155.9697 2496 5.9967
28.1077 156.9697 2512 5.9980
27.7652 157.9697 2528 6.0017
28.0139 158.9697 2544 6.0054
27.872 159.9697 2560 6.0063
27.7807 160.9697 2576 6.0101
27.9509 161.9697 2592 6.0112
27.9565 162.9697 2608 6.0139
27.74 163.9697 2624 6.0165
27.8607 164.9697 2640 6.0191
27.7131 165.9697 2656 6.0194
27.6894 166.9697 2672 6.0211
27.7251 167.9697 2688 6.0234
27.7298 168.9697 2704 6.0243
27.6216 169.9697 2720 6.0259
27.7812 170.9697 2736 6.0269
27.7094 171.9697 2752 6.0277
27.7338 172.9697 2768 6.0286
27.6887 173.9697 2784 6.0289
27.69 174.9697 2800 6.0292
27.7041 175.9697 2816 6.0303
27.6164 176.9697 2832 6.0305
27.493 177.9697 2848 6.0307
27.7736 178.9697 2864 6.0321
27.5697 179.9697 2880 6.0323
27.4724 180.9697 2896 6.0323
27.4104 181.9697 2912 6.0325
27.4961 182.9697 2928 6.0327
27.7078 183.9697 2944 6.0327
27.6902 184.9697 2960 6.0328
27.6065 185.9697 2976 6.0328
27.5061 186.9697 2992 6.0327
27.7366 187.4848 3000 6.0327

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-fronting-bigram