impossible-llms-german-random-fourgram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.9430

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
37.7338 1.0 18 9.3648
35.7624 2.0 36 8.8937
33.8558 3.0 54 8.4733
32.0947 4.0 72 8.0390
30.1266 5.0 90 7.5550
28.1882 6.0 108 7.0654
26.5774 7.0 126 6.6686
25.9146 8.0 144 6.4362
25.232 9.0 162 6.3227
24.8537 10.0 180 6.2398
24.4154 11.0 198 6.1600
24.228 12.0 216 6.1035
24.0803 13.0 234 6.0586
24.0885 14.0 252 6.0192
23.6898 15.0 270 5.9871
23.4396 16.0 288 5.9482
23.4791 17.0 306 5.9201
23.2907 18.0 324 5.8843
23.3204 19.0 342 5.8601
22.9215 20.0 360 5.8303
22.8381 21.0 378 5.7999
22.9067 22.0 396 5.7706
22.4945 23.0 414 5.7395
22.5825 24.0 432 5.7105
22.2327 25.0 450 5.6771
22.1948 26.0 468 5.6481
22.1161 27.0 486 5.6131
21.7786 28.0 504 5.5857
21.608 29.0 522 5.5584
21.6092 30.0 540 5.5294
21.5158 31.0 558 5.5099
21.2812 32.0 576 5.4840
21.1324 33.0 594 5.4645
21.0367 34.0 612 5.4435
20.8917 35.0 630 5.4239
20.8304 36.0 648 5.4095
20.5685 37.0 666 5.3899
20.7227 38.0 684 5.3780
20.5019 39.0 702 5.3647
20.237 40.0 720 5.3504
20.2962 41.0 738 5.3395
20.1685 42.0 756 5.3290
20.0126 43.0 774 5.3224
19.9712 44.0 792 5.3135
20.0726 45.0 810 5.3062
19.7299 46.0 828 5.2992
19.623 47.0 846 5.2930
19.6938 48.0 864 5.2912
19.5416 49.0 882 5.2889
19.6437 50.0 900 5.2886
19.3772 51.0 918 5.2847
19.1646 52.0 936 5.2846
19.3211 53.0 954 5.2805
18.9739 54.0 972 5.2832
18.8837 55.0 990 5.2825
18.9044 56.0 1008 5.2842
18.9384 57.0 1026 5.2849
18.8621 58.0 1044 5.2873
18.7566 59.0 1062 5.2925
18.6145 60.0 1080 5.2943
18.4163 61.0 1098 5.3036
18.3534 62.0 1116 5.3079
18.502 63.0 1134 5.3107
18.2143 64.0 1152 5.3149
18.338 65.0 1170 5.3235
18.1699 66.0 1188 5.3293
18.1661 67.0 1206 5.3375
18.0224 68.0 1224 5.3471
17.6682 69.0 1242 5.3529
17.8052 70.0 1260 5.3615
17.5968 71.0 1278 5.3675
17.4778 72.0 1296 5.3773
17.6043 73.0 1314 5.3858
17.4908 74.0 1332 5.3964
17.3254 75.0 1350 5.4018
17.4003 76.0 1368 5.4124
17.3649 77.0 1386 5.4241
17.2245 78.0 1404 5.4301
17.3074 79.0 1422 5.4462
17.1521 80.0 1440 5.4551
16.811 81.0 1458 5.4611
16.9251 82.0 1476 5.4712
16.8495 83.0 1494 5.4820
16.9539 84.0 1512 5.4906
16.692 85.0 1530 5.4990
16.7383 86.0 1548 5.5123
16.6054 87.0 1566 5.5219
16.5755 88.0 1584 5.5302
16.4844 89.0 1602 5.5382
16.3689 90.0 1620 5.5526
16.3734 91.0 1638 5.5635
16.2672 92.0 1656 5.5733
16.2833 93.0 1674 5.5851
16.2626 94.0 1692 5.5924
16.1087 95.0 1710 5.6025
16.3242 96.0 1728 5.6100
16.0916 97.0 1746 5.6241
16.0279 98.0 1764 5.6313
15.9436 99.0 1782 5.6448
15.9385 100.0 1800 5.6488
15.9182 101.0 1818 5.6596
15.6631 102.0 1836 5.6721
15.7364 103.0 1854 5.6818
15.8297 104.0 1872 5.6852
15.7784 105.0 1890 5.6973
15.676 106.0 1908 5.7030
15.6292 107.0 1926 5.7139
15.5341 108.0 1944 5.7238
15.5014 109.0 1962 5.7322
15.4952 110.0 1980 5.7425
15.2674 111.0 1998 5.7469
15.3804 112.0 2016 5.7527
15.1628 113.0 2034 5.7648
15.2465 114.0 2052 5.7694
15.265 115.0 2070 5.7807
15.2491 116.0 2088 5.7871
15.187 117.0 2106 5.7928
15.2305 118.0 2124 5.8011
15.0624 119.0 2142 5.8061
15.0283 120.0 2160 5.8126
15.0128 121.0 2178 5.8241
14.9024 122.0 2196 5.8275
15.0507 123.0 2214 5.8336
14.9945 124.0 2232 5.8384
14.9695 125.0 2250 5.8458
14.9255 126.0 2268 5.8542
14.8809 127.0 2286 5.8562
14.8236 128.0 2304 5.8621
14.715 129.0 2322 5.8650
14.7913 130.0 2340 5.8739
14.8002 131.0 2358 5.8759
14.7913 132.0 2376 5.8817
14.7532 133.0 2394 5.8849
14.7149 134.0 2412 5.8883
14.7549 135.0 2430 5.8916
14.6338 136.0 2448 5.8976
14.6065 137.0 2466 5.9038
14.6682 138.0 2484 5.9063
14.6515 139.0 2502 5.9084
14.713 140.0 2520 5.9124
14.5961 141.0 2538 5.9161
14.6323 142.0 2556 5.9189
14.3841 143.0 2574 5.9198
14.4734 144.0 2592 5.9218
14.4725 145.0 2610 5.9243
14.5212 146.0 2628 5.9275
14.5615 147.0 2646 5.9290
14.5326 148.0 2664 5.9303
14.4589 149.0 2682 5.9320
14.4737 150.0 2700 5.9347
14.5569 151.0 2718 5.9346
14.6112 152.0 2736 5.9364
14.4516 153.0 2754 5.9379
14.4938 154.0 2772 5.9372
14.4729 155.0 2790 5.9397
14.3714 156.0 2808 5.9405
14.3877 157.0 2826 5.9403
14.4449 158.0 2844 5.9419
14.349 159.0 2862 5.9414
14.3506 160.0 2880 5.9424
14.4542 161.0 2898 5.9425
14.3577 162.0 2916 5.9427
14.4566 163.0 2934 5.9427
14.4671 164.0 2952 5.9429
14.4485 165.0 2970 5.9430
14.4603 166.0 2988 5.9430
28.6518 166.6857 3000 5.9430

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-random-fourgram