impossible-llms-german-random

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.5580

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
28.22 1.0 18 9.3927
26.7279 2.0 36 8.9499
25.6547 3.0 54 8.5507
24.3808 4.0 72 8.1185
22.881 5.0 90 7.6329
21.4807 6.0 108 7.1485
20.4576 7.0 126 6.7649
19.6282 8.0 144 6.5364
19.1997 9.0 162 6.4326
18.861 10.0 180 6.3683
18.7922 11.0 198 6.3189
18.9556 12.0 216 6.2618
18.5314 13.0 234 6.2134
18.5663 14.0 252 6.1788
18.4854 15.0 270 6.1504
18.1668 16.0 288 6.1251
18.3746 17.0 306 6.1035
18.0233 18.0 324 6.0776
18.0164 19.0 342 6.0475
17.5869 20.0 360 6.0198
17.8081 21.0 378 5.9977
17.5177 22.0 396 5.9725
17.3173 23.0 414 5.9435
17.5747 24.0 432 5.9193
17.557 25.0 450 5.8967
17.2291 26.0 468 5.8753
17.422 27.0 486 5.8524
17.1727 28.0 504 5.8351
17.0568 29.0 522 5.8191
17.0573 30.0 540 5.8014
16.768 31.0 558 5.7869
16.6649 32.0 576 5.7707
16.6858 33.0 594 5.7623
16.8468 34.0 612 5.7485
16.8416 35.0 630 5.7400
16.5872 36.0 648 5.7308
16.4419 37.0 666 5.7204
16.4949 38.0 684 5.7156
16.4446 39.0 702 5.7104
16.1716 40.0 720 5.7046
16.2623 41.0 738 5.6975
16.177 42.0 756 5.6957
16.2276 43.0 774 5.6947
15.9746 44.0 792 5.6945
15.9396 45.0 810 5.6915
16.0085 46.0 828 5.6887
15.9081 47.0 846 5.6941
15.7286 48.0 864 5.6940
15.7661 49.0 882 5.6973
15.6632 50.0 900 5.6976
15.8302 51.0 918 5.7014
15.735 52.0 936 5.7084
15.5396 53.0 954 5.7129
15.2283 54.0 972 5.7151
15.3666 55.0 990 5.7200
15.3172 56.0 1008 5.7237
15.208 57.0 1026 5.7329
15.4495 58.0 1044 5.7408
15.0151 59.0 1062 5.7522
15.2973 60.0 1080 5.7577
15.0306 61.0 1098 5.7656
14.9065 62.0 1116 5.7721
14.8622 63.0 1134 5.7840
15.0454 64.0 1152 5.7948
14.6227 65.0 1170 5.8064
14.661 66.0 1188 5.8118
14.5349 67.0 1206 5.8224
14.577 68.0 1224 5.8362
14.4111 69.0 1242 5.8465
14.324 70.0 1260 5.8550
14.3226 71.0 1278 5.8667
14.3473 72.0 1296 5.8785
14.293 73.0 1314 5.8909
14.3887 74.0 1332 5.9039
14.3544 75.0 1350 5.9148
14.1185 76.0 1368 5.9321
14.0668 77.0 1386 5.9423
14.1742 78.0 1404 5.9526
13.8125 79.0 1422 5.9667
13.9209 80.0 1440 5.9815
13.8013 81.0 1458 5.9925
13.8491 82.0 1476 6.0041
13.6533 83.0 1494 6.0213
13.6402 84.0 1512 6.0318
13.5231 85.0 1530 6.0379
13.5759 86.0 1548 6.0539
13.4849 87.0 1566 6.0718
13.4543 88.0 1584 6.0866
13.2973 89.0 1602 6.0971
13.2578 90.0 1620 6.1055
13.3011 91.0 1638 6.1199
13.163 92.0 1656 6.1289
13.178 93.0 1674 6.1399
13.3033 94.0 1692 6.1526
13.318 95.0 1710 6.1627
12.9948 96.0 1728 6.1788
12.9897 97.0 1746 6.1916
13.0252 98.0 1764 6.2004
13.0065 99.0 1782 6.2176
12.9862 100.0 1800 6.2253
12.9079 101.0 1818 6.2351
12.9666 102.0 1836 6.2442
12.8916 103.0 1854 6.2537
12.7161 104.0 1872 6.2634
12.8223 105.0 1890 6.2823
12.6665 106.0 1908 6.2882
12.6533 107.0 1926 6.2976
12.5934 108.0 1944 6.3050
12.6871 109.0 1962 6.3143
12.6676 110.0 1980 6.3266
12.3628 111.0 1998 6.3319
12.6313 112.0 2016 6.3413
12.328 113.0 2034 6.3557
12.4958 114.0 2052 6.3611
12.5022 115.0 2070 6.3673
12.3673 116.0 2088 6.3787
12.3613 117.0 2106 6.3904
12.4782 118.0 2124 6.3973
12.3523 119.0 2142 6.4094
12.1093 120.0 2160 6.4166
12.2079 121.0 2178 6.4194
12.1763 122.0 2196 6.4229
12.1662 123.0 2214 6.4335
12.256 124.0 2232 6.4407
12.1079 125.0 2250 6.4467
12.1875 126.0 2268 6.4554
12.1299 127.0 2286 6.4605
11.9807 128.0 2304 6.4662
12.09 129.0 2322 6.4693
11.9597 130.0 2340 6.4760
11.9942 131.0 2358 6.4824
12.0525 132.0 2376 6.4904
12.0041 133.0 2394 6.4914
11.9356 134.0 2412 6.4995
12.07 135.0 2430 6.5004
11.8729 136.0 2448 6.5052
11.9321 137.0 2466 6.5094
11.9413 138.0 2484 6.5114
11.9385 139.0 2502 6.5158
11.8975 140.0 2520 6.5230
11.7267 141.0 2538 6.5264
11.9369 142.0 2556 6.5283
11.8706 143.0 2574 6.5292
11.7837 144.0 2592 6.5352
11.6952 145.0 2610 6.5372
11.8134 146.0 2628 6.5397
11.7266 147.0 2646 6.5439
11.7295 148.0 2664 6.5441
11.5708 149.0 2682 6.5456
11.7177 150.0 2700 6.5479
11.7275 151.0 2718 6.5502
11.6484 152.0 2736 6.5518
11.8526 153.0 2754 6.5529
11.7822 154.0 2772 6.5530
11.8092 155.0 2790 6.5533
11.6373 156.0 2808 6.5540
11.7904 157.0 2826 6.5569
11.7237 158.0 2844 6.5575
11.6092 159.0 2862 6.5569
11.6797 160.0 2880 6.5569
11.7197 161.0 2898 6.5575
11.9107 162.0 2916 6.5581
11.7386 163.0 2934 6.5578
11.7278 164.0 2952 6.5580
11.8036 165.0 2970 6.5580
11.6859 166.0 2988 6.5580
31.2663 166.6906 3000 6.5580

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-random