impossible-llms-german-fronting-n

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.8468

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
76.0232 0.9697 16 9.4318
71.6347 1.9697 32 8.9520
69.2628 2.9697 48 8.6144
65.8791 3.9697 64 8.2167
62.691 4.9697 80 7.7872
58.9734 5.9697 96 7.3124
55.5873 6.9697 112 6.8861
52.7993 7.9697 128 6.5625
51.0556 8.9697 144 6.3603
50.0798 9.9697 160 6.2534
49.2308 10.9697 176 6.1755
48.881 11.9697 192 6.0952
48.2787 12.9697 208 6.0302
47.6611 13.9697 224 5.9747
47.4827 14.9697 240 5.9406
47.0137 15.9697 256 5.8943
46.7397 16.9697 272 5.8559
46.583 17.9697 288 5.8218
45.9147 18.9697 304 5.7926
45.7679 19.9697 320 5.7632
45.4455 20.9697 336 5.7360
45.2777 21.9697 352 5.7102
44.906 22.9697 368 5.6821
44.7446 23.9697 384 5.6569
44.0693 24.9697 400 5.6278
44.2293 25.9697 416 5.6006
43.4461 26.9697 432 5.5720
43.3857 27.9697 448 5.5360
43.198 28.9697 464 5.5073
43.1244 29.9697 480 5.4757
42.6024 30.9697 496 5.4414
42.2986 31.9697 512 5.4172
41.842 32.9697 528 5.3924
41.6332 33.9697 544 5.3675
41.789 34.9697 560 5.3394
41.1334 35.9697 576 5.3210
41.0109 36.9697 592 5.3030
40.7009 37.9697 608 5.2816
40.5458 38.9697 624 5.2644
40.3853 39.9697 640 5.2487
40.2776 40.9697 656 5.2370
39.9171 41.9697 672 5.2202
39.7794 42.9697 688 5.2064
39.5459 43.9697 704 5.1913
39.3539 44.9697 720 5.1830
38.9157 45.9697 736 5.1793
39.1049 46.9697 752 5.1649
38.921 47.9697 768 5.1577
38.4288 48.9697 784 5.1540
38.497 49.9697 800 5.1463
37.997 50.9697 816 5.1410
37.8313 51.9697 832 5.1420
37.8096 52.9697 848 5.1331
37.5963 53.9697 864 5.1320
37.4463 54.9697 880 5.1352
37.0942 55.9697 896 5.1308
37.2328 56.9697 912 5.1299
36.9816 57.9697 928 5.1315
36.6824 58.9697 944 5.1340
36.6477 59.9697 960 5.1331
36.5911 60.9697 976 5.1394
36.1343 61.9697 992 5.1435
36.2106 62.9697 1008 5.1430
35.8942 63.9697 1024 5.1495
35.6054 64.9697 1040 5.1543
35.6624 65.9697 1056 5.1588
35.5152 66.9697 1072 5.1647
35.318 67.9697 1088 5.1660
35.1982 68.9697 1104 5.1704
35.0986 69.9697 1120 5.1797
34.9194 70.9697 1136 5.1890
34.7239 71.9697 1152 5.1941
34.5779 72.9697 1168 5.1978
34.2269 73.9697 1184 5.2058
34.2593 74.9697 1200 5.2147
33.9711 75.9697 1216 5.2208
33.7718 76.9697 1232 5.2301
33.8966 77.9697 1248 5.2399
33.6011 78.9697 1264 5.2466
33.5622 79.9697 1280 5.2532
33.3978 80.9697 1296 5.2617
33.1934 81.9697 1312 5.2759
32.9452 82.9697 1328 5.2842
33.1958 83.9697 1344 5.2890
32.7968 84.9697 1360 5.3002
32.865 85.9697 1376 5.3107
32.5428 86.9697 1392 5.3176
32.2658 87.9697 1408 5.3299
32.3847 88.9697 1424 5.3376
32.2485 89.9697 1440 5.3438
32.0939 90.9697 1456 5.3599
32.0239 91.9697 1472 5.3687
31.7606 92.9697 1488 5.3735
31.7933 93.9697 1504 5.3841
31.6453 94.9697 1520 5.3990
31.5913 95.9697 1536 5.4016
31.0883 96.9697 1552 5.4143
31.244 97.9697 1568 5.4226
31.193 98.9697 1584 5.4352
31.0987 99.9697 1600 5.4415
30.88 100.9697 1616 5.4481
30.8143 101.9697 1632 5.4619
30.6025 102.9697 1648 5.4711
30.6779 103.9697 1664 5.4759
30.6561 104.9697 1680 5.4852
30.6397 105.9697 1696 5.4980
30.3599 106.9697 1712 5.5075
30.1229 107.9697 1728 5.5187
30.1375 108.9697 1744 5.5271
29.9615 109.9697 1760 5.5319
29.9015 110.9697 1776 5.5453
29.6813 111.9697 1792 5.5521
29.8179 112.9697 1808 5.5558
29.6817 113.9697 1824 5.5707
29.4011 114.9697 1840 5.5713
29.54 115.9697 1856 5.5882
29.3389 116.9697 1872 5.5973
29.387 117.9697 1888 5.6045
29.1321 118.9697 1904 5.6094
29.1001 119.9697 1920 5.6183
29.1747 120.9697 1936 5.6299
29.0975 121.9697 1952 5.6360
28.9631 122.9697 1968 5.6405
28.888 123.9697 1984 5.6492
28.6687 124.9697 2000 5.6527
28.6548 125.9697 2016 5.6607
28.7201 126.9697 2032 5.6679
28.7214 127.9697 2048 5.6766
28.4436 128.9697 2064 5.6845
28.503 129.9697 2080 5.6842
28.4427 130.9697 2096 5.6931
28.4169 131.9697 2112 5.7016
28.443 132.9697 2128 5.7068
28.1858 133.9697 2144 5.7126
28.2171 134.9697 2160 5.7192
28.1178 135.9697 2176 5.7239
28.0608 136.9697 2192 5.7297
28.0232 137.9697 2208 5.7347
27.9148 138.9697 2224 5.7467
27.8405 139.9697 2240 5.7435
27.8553 140.9697 2256 5.7536
27.8158 141.9697 2272 5.7605
27.7173 142.9697 2288 5.7609
27.6875 143.9697 2304 5.7643
27.6526 144.9697 2320 5.7705
27.6147 145.9697 2336 5.7761
27.5678 146.9697 2352 5.7805
27.6038 147.9697 2368 5.7839
27.6582 148.9697 2384 5.7891
27.4517 149.9697 2400 5.7924
27.5229 150.9697 2416 5.7951
27.4243 151.9697 2432 5.7994
27.4451 152.9697 2448 5.8036
27.4183 153.9697 2464 5.8066
27.3851 154.9697 2480 5.8106
27.2778 155.9697 2496 5.8111
27.4134 156.9697 2512 5.8158
27.1447 157.9697 2528 5.8176
27.3356 158.9697 2544 5.8200
27.2262 159.9697 2560 5.8243
27.0929 160.9697 2576 5.8259
27.2497 161.9697 2592 5.8273
27.2744 162.9697 2608 5.8309
27.0844 163.9697 2624 5.8315
27.1939 164.9697 2640 5.8306
27.0606 165.9697 2656 5.8347
27.0495 166.9697 2672 5.8352
27.0569 167.9697 2688 5.8362
27.0757 168.9697 2704 5.8377
26.9603 169.9697 2720 5.8386
27.1405 170.9697 2736 5.8409
27.0234 171.9697 2752 5.8422
27.1466 172.9697 2768 5.8432
27.0645 173.9697 2784 5.8424
27.007 174.9697 2800 5.8439
27.0504 175.9697 2816 5.8447
26.9868 176.9697 2832 5.8446
26.8777 177.9697 2848 5.8461
27.0836 178.9697 2864 5.8453
26.9487 179.9697 2880 5.8466
26.8259 180.9697 2896 5.8465
26.7474 181.9697 2912 5.8468
26.9062 182.9697 2928 5.8467
27.0621 183.9697 2944 5.8467
27.0385 184.9697 2960 5.8468
26.9459 185.9697 2976 5.8468
26.7926 186.9697 2992 5.8468
27.1117 187.4848 3000 5.8468

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-fronting-n