impossible-llms-german-random-trigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.7497

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
76.0554 0.9714 17 9.4159
71.5116 1.9714 34 8.9223
68.1104 2.9714 51 8.4983
64.8415 3.9714 68 8.0835
61.5923 4.9714 85 7.6397
57.4736 5.9714 102 7.1809
53.9559 6.9714 119 6.7377
51.5502 7.9714 136 6.4381
50.2245 8.9714 153 6.2864
49.7075 9.9714 170 6.1920
48.9016 10.9714 187 6.1134
48.5056 11.9714 204 6.0484
47.7919 12.9714 221 5.9937
47.6727 13.9714 238 5.9478
47.0178 14.9714 255 5.9050
46.7872 15.9714 272 5.8797
46.6167 16.9714 289 5.8400
46.0477 17.9714 306 5.8148
45.8791 18.9714 323 5.7810
45.4665 19.9714 340 5.7590
45.567 20.9714 357 5.7255
45.1798 21.9714 374 5.6946
44.9232 22.9714 391 5.6630
44.5422 23.9714 408 5.6316
44.3638 24.9714 425 5.6000
43.7262 25.9714 442 5.5667
43.5779 26.9714 459 5.5362
43.2879 27.9714 476 5.5000
42.837 28.9714 493 5.4680
42.8458 29.9714 510 5.4374
42.5224 30.9714 527 5.4100
42.426 31.9714 544 5.3854
42.0329 32.9714 561 5.3660
41.6448 33.9714 578 5.3379
41.5732 34.9714 595 5.3176
41.415 35.9714 612 5.2978
41.18 36.9714 629 5.2752
40.7777 37.9714 646 5.2583
40.6049 38.9714 663 5.2439
40.3485 39.9714 680 5.2284
40.0764 40.9714 697 5.2123
40.0139 41.9714 714 5.2003
39.7959 42.9714 731 5.1847
39.5697 43.9714 748 5.1716
39.4945 44.9714 765 5.1659
39.1866 45.9714 782 5.1527
39.2213 46.9714 799 5.1453
38.8962 47.9714 816 5.1360
38.6798 48.9714 833 5.1261
38.3602 49.9714 850 5.1270
38.2267 50.9714 867 5.1193
37.9815 51.9714 884 5.1154
37.8963 52.9714 901 5.1140
37.6935 53.9714 918 5.1109
37.7724 54.9714 935 5.1098
37.0604 55.9714 952 5.1090
37.3595 56.9714 969 5.1066
37.1589 57.9714 986 5.1066
36.6485 58.9714 1003 5.1044
36.7598 59.9714 1020 5.1052
36.5849 60.9714 1037 5.1088
36.1596 61.9714 1054 5.1110
36.3068 62.9714 1071 5.1163
36.3486 63.9714 1088 5.1161
35.9179 64.9714 1105 5.1214
35.6792 65.9714 1122 5.1266
35.4873 66.9714 1139 5.1295
35.3164 67.9714 1156 5.1387
35.0348 68.9714 1173 5.1421
35.1965 69.9714 1190 5.1503
35.086 70.9714 1207 5.1529
34.6337 71.9714 1224 5.1637
34.7382 72.9714 1241 5.1662
34.4871 73.9714 1258 5.1757
34.2342 74.9714 1275 5.1806
34.1668 75.9714 1292 5.1919
34.0995 76.9714 1309 5.1998
33.8965 77.9714 1326 5.2114
33.9098 78.9714 1343 5.2200
33.63 79.9714 1360 5.2305
33.4706 80.9714 1377 5.2291
33.505 81.9714 1394 5.2448
33.3618 82.9714 1411 5.2457
33.132 83.9714 1428 5.2597
33.0071 84.9714 1445 5.2663
32.8751 85.9714 1462 5.2750
32.8287 86.9714 1479 5.2870
32.4965 87.9714 1496 5.2981
32.6413 88.9714 1513 5.3074
32.5603 89.9714 1530 5.3142
32.2966 90.9714 1547 5.3253
32.1185 91.9714 1564 5.3355
32.0684 92.9714 1581 5.3424
32.202 93.9714 1598 5.3535
31.5632 94.9714 1615 5.3645
31.565 95.9714 1632 5.3747
31.4838 96.9714 1649 5.3822
31.4564 97.9714 1666 5.3918
31.3305 98.9714 1683 5.3993
31.3431 99.9714 1700 5.4117
31.1942 100.9714 1717 5.4160
30.9246 101.9714 1734 5.4310
30.8694 102.9714 1751 5.4373
30.8388 103.9714 1768 5.4432
30.6456 104.9714 1785 5.4533
30.5814 105.9714 1802 5.4659
30.5805 106.9714 1819 5.4699
30.5545 107.9714 1836 5.4812
30.4305 108.9714 1853 5.4908
30.157 109.9714 1870 5.4988
29.9876 110.9714 1887 5.5073
30.1266 111.9714 1904 5.5117
29.8895 112.9714 1921 5.5229
29.7649 113.9714 1938 5.5295
29.8926 114.9714 1955 5.5350
29.6378 115.9714 1972 5.5491
29.6415 116.9714 1989 5.5559
29.7529 117.9714 2006 5.5609
29.3384 118.9714 2023 5.5695
29.359 119.9714 2040 5.5755
29.3304 120.9714 2057 5.5825
29.2433 121.9714 2074 5.5912
29.0092 122.9714 2091 5.5983
29.3211 123.9714 2108 5.6037
29.0934 124.9714 2125 5.6071
28.9074 125.9714 2142 5.6124
28.8782 126.9714 2159 5.6202
28.9611 127.9714 2176 5.6302
28.8142 128.9714 2193 5.6343
28.6845 129.9714 2210 5.6432
28.738 130.9714 2227 5.6447
28.7464 131.9714 2244 5.6530
28.6346 132.9714 2261 5.6572
28.4787 133.9714 2278 5.6630
28.504 134.9714 2295 5.6695
28.3428 135.9714 2312 5.6756
28.3463 136.9714 2329 5.6787
28.496 137.9714 2346 5.6803
28.3236 138.9714 2363 5.6871
28.2455 139.9714 2380 5.6904
28.2482 140.9714 2397 5.6949
28.1263 141.9714 2414 5.6998
28.1918 142.9714 2431 5.7042
28.1143 143.9714 2448 5.7056
28.0906 144.9714 2465 5.7094
28.0002 145.9714 2482 5.7113
27.8926 146.9714 2499 5.7156
28.1241 147.9714 2516 5.7199
28.0328 148.9714 2533 5.7210
27.9288 149.9714 2550 5.7237
27.9317 150.9714 2567 5.7265
27.7598 151.9714 2584 5.7274
27.9603 152.9714 2601 5.7322
27.7704 153.9714 2618 5.7333
27.7615 154.9714 2635 5.7345
27.7547 155.9714 2652 5.7361
27.7851 156.9714 2669 5.7389
27.726 157.9714 2686 5.7394
27.7772 158.9714 2703 5.7407
27.8406 159.9714 2720 5.7429
27.7328 160.9714 2737 5.7429
27.6921 161.9714 2754 5.7444
27.6875 162.9714 2771 5.7458
27.8319 163.9714 2788 5.7465
27.7842 164.9714 2805 5.7467
27.6917 165.9714 2822 5.7481
27.6102 166.9714 2839 5.7489
27.8374 167.9714 2856 5.7481
27.6557 168.9714 2873 5.7492
27.5827 169.9714 2890 5.7491
27.7106 170.9714 2907 5.7492
27.6843 171.9714 2924 5.7496
27.6455 172.9714 2941 5.7494
27.6835 173.9714 2958 5.7496
27.6126 174.9714 2975 5.7497
27.6715 175.9714 2992 5.7497
27.734 176.4571 3000 5.7497

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
11
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-random-trigram