impossible-llms-german-natural

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.4432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
37.9888 1.0 17 9.3528
35.7236 2.0 34 8.8980
34.2014 3.0 51 8.4616
32.4149 4.0 68 8.0069
30.2587 5.0 85 7.5168
28.432 6.0 102 7.0356
26.6483 7.0 119 6.6363
25.5771 8.0 136 6.3366
24.564 9.0 153 6.1533
24.2837 10.0 170 6.0317
23.732 11.0 187 5.9404
23.4501 12.0 204 5.8571
23.0614 13.0 221 5.7827
22.8862 14.0 238 5.7164
22.6625 15.0 255 5.6735
22.3893 16.0 272 5.6288
22.3211 17.0 289 5.5947
21.8821 18.0 306 5.5613
21.9518 19.0 323 5.5305
21.6337 20.0 340 5.4958
21.575 21.0 357 5.4622
21.4181 22.0 374 5.4385
21.1835 23.0 391 5.3967
21.1219 24.0 408 5.3637
20.8773 25.0 425 5.3191
20.8831 26.0 442 5.2788
20.819 27.0 459 5.2407
20.3499 28.0 476 5.1967
20.2114 29.0 493 5.1636
20.1814 30.0 510 5.1276
19.9462 31.0 527 5.0974
19.8372 32.0 544 5.0664
19.5911 33.0 561 5.0434
19.4307 34.0 578 5.0179
19.5706 35.0 595 4.9894
19.1388 36.0 612 4.9639
19.2116 37.0 629 4.9503
18.9487 38.0 646 4.9288
18.9178 39.0 663 4.9106
18.7411 40.0 680 4.8942
18.4311 41.0 697 4.8825
18.508 42.0 714 4.8720
18.271 43.0 731 4.8577
18.2744 44.0 748 4.8448
18.2021 45.0 765 4.8357
18.1162 46.0 782 4.8269
17.8795 47.0 799 4.8207
17.8161 48.0 816 4.8136
17.8535 49.0 833 4.8083
17.7169 50.0 850 4.8033
17.608 51.0 867 4.8007
17.5039 52.0 884 4.7972
17.3895 53.0 901 4.7982
17.4142 54.0 918 4.7921
17.1677 55.0 935 4.7970
17.3322 56.0 952 4.7931
17.0638 57.0 969 4.7936
16.9893 58.0 986 4.7999
17.0627 59.0 1003 4.7984
16.8005 60.0 1020 4.8026
16.822 61.0 1037 4.8068
16.7404 62.0 1054 4.8078
16.6783 63.0 1071 4.8149
16.4598 64.0 1088 4.8199
16.4507 65.0 1105 4.8224
16.4699 66.0 1122 4.8311
16.328 67.0 1139 4.8368
16.2451 68.0 1156 4.8419
16.049 69.0 1173 4.8491
15.9962 70.0 1190 4.8596
15.9141 71.0 1207 4.8603
15.8765 72.0 1224 4.8684
15.7415 73.0 1241 4.8785
15.6619 74.0 1258 4.8863
15.6196 75.0 1275 4.8958
15.5707 76.0 1292 4.8993
15.5836 77.0 1309 4.9112
15.6174 78.0 1326 4.9182
15.4552 79.0 1343 4.9274
15.2093 80.0 1360 4.9386
15.1343 81.0 1377 4.9440
15.2186 82.0 1394 4.9598
15.2129 83.0 1411 4.9629
15.0068 84.0 1428 4.9771
15.0696 85.0 1445 4.9844
15.0001 86.0 1462 4.9905
14.8185 87.0 1479 5.0014
14.7553 88.0 1496 5.0119
14.795 89.0 1513 5.0189
14.6741 90.0 1530 5.0302
14.6173 91.0 1547 5.0378
14.7111 92.0 1564 5.0510
14.5775 93.0 1581 5.0556
14.4773 94.0 1598 5.0610
14.3649 95.0 1615 5.0779
14.3988 96.0 1632 5.0831
14.279 97.0 1649 5.0979
14.2782 98.0 1666 5.1033
14.2275 99.0 1683 5.1101
14.2227 100.0 1700 5.1182
14.0224 101.0 1717 5.1299
13.9125 102.0 1734 5.1431
14.006 103.0 1751 5.1446
14.117 104.0 1768 5.1531
13.907 105.0 1785 5.1660
13.9161 106.0 1802 5.1734
13.8795 107.0 1819 5.1808
13.9134 108.0 1836 5.1904
13.747 109.0 1853 5.1948
13.7201 110.0 1870 5.2076
13.7543 111.0 1887 5.2126
13.6064 112.0 1904 5.2242
13.5217 113.0 1921 5.2271
13.5011 114.0 1938 5.2416
13.418 115.0 1955 5.2455
13.4843 116.0 1972 5.2541
13.5348 117.0 1989 5.2601
13.3599 118.0 2006 5.2655
13.3885 119.0 2023 5.2753
13.279 120.0 2040 5.2775
13.2548 121.0 2057 5.2872
13.2727 122.0 2074 5.2934
13.2849 123.0 2091 5.3007
13.187 124.0 2108 5.3057
13.2051 125.0 2125 5.3085
13.2568 126.0 2142 5.3204
13.1217 127.0 2159 5.3282
13.1062 128.0 2176 5.3304
12.9647 129.0 2193 5.3365
13.0156 130.0 2210 5.3412
13.0614 131.0 2227 5.3483
13.0743 132.0 2244 5.3503
13.0518 133.0 2261 5.3565
12.9692 134.0 2278 5.3621
12.9269 135.0 2295 5.3694
13.0005 136.0 2312 5.3696
13.007 137.0 2329 5.3748
12.8546 138.0 2346 5.3792
12.8133 139.0 2363 5.3816
12.7968 140.0 2380 5.3835
12.8415 141.0 2397 5.3892
12.7869 142.0 2414 5.3958
12.8142 143.0 2431 5.3998
12.7874 144.0 2448 5.4027
12.8177 145.0 2465 5.4072
12.6922 146.0 2482 5.4084
12.7024 147.0 2499 5.4110
12.6719 148.0 2516 5.4127
12.6329 149.0 2533 5.4172
12.7134 150.0 2550 5.4188
12.6052 151.0 2567 5.4206
12.6019 152.0 2584 5.4240
12.6619 153.0 2601 5.4265
12.6578 154.0 2618 5.4256
12.7177 155.0 2635 5.4290
12.5901 156.0 2652 5.4316
12.5928 157.0 2669 5.4327
12.5668 158.0 2686 5.4348
12.647 159.0 2703 5.4355
12.5917 160.0 2720 5.4366
12.5569 161.0 2737 5.4383
12.6728 162.0 2754 5.4397
12.5257 163.0 2771 5.4399
12.621 164.0 2788 5.4407
12.5516 165.0 2805 5.4409
12.6494 166.0 2822 5.4416
12.4839 167.0 2839 5.4426
12.6043 168.0 2856 5.4420
12.6117 169.0 2873 5.4424
12.6067 170.0 2890 5.4429
12.5171 171.0 2907 5.4432
12.5504 172.0 2924 5.4432
12.4739 173.0 2941 5.4429
12.5461 174.0 2958 5.4433
12.4804 175.0 2975 5.4432
12.5913 176.0 2992 5.4432
25.0053 176.4848 3000 5.4432

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-natural