impossible-llms-german-mirror-reversal

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6947

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
77.7786 0.9449 15 9.5985
73.362 1.9449 30 9.1037
70.3305 2.9449 45 8.7617
66.9578 3.9449 60 8.3330
63.7325 4.9449 75 7.9229
60.5812 5.9449 90 7.5069
57.2013 6.9449 105 7.0945
54.1672 7.9449 120 6.7232
51.9153 8.9449 135 6.4424
50.4689 9.9449 150 6.2799
49.3935 10.9449 165 6.1810
49.0969 11.9449 180 6.1140
48.3087 12.9449 195 6.0453
47.9775 13.9449 210 5.9866
47.6174 14.9449 225 5.9371
47.4572 15.9449 240 5.9014
46.8932 16.9449 255 5.8724
46.6447 17.9449 270 5.8382
46.3224 18.9449 285 5.8003
46.1563 19.9449 300 5.7602
45.7068 20.9449 315 5.7192
45.2657 21.9449 330 5.6639
44.8004 22.9449 345 5.6269
44.7603 23.9449 360 5.5749
44.2137 24.9449 375 5.5330
43.6478 25.9449 390 5.4967
43.3557 26.9449 405 5.4637
43.3914 27.9449 420 5.4229
42.8991 28.9449 435 5.3885
42.4619 29.9449 450 5.3620
42.2611 30.9449 465 5.3354
41.9936 31.9449 480 5.3113
41.6828 32.9449 495 5.2834
41.6263 33.9449 510 5.2550
41.2193 34.9449 525 5.2344
41.4535 35.9449 540 5.2161
40.7891 36.9449 555 5.1963
40.568 37.9449 570 5.1749
40.4075 38.9449 585 5.1583
40.2574 39.9449 600 5.1434
39.9599 40.9449 615 5.1260
39.7297 41.9449 630 5.1126
39.7484 42.9449 645 5.0950
39.1652 43.9449 660 5.0796
39.2023 44.9449 675 5.0676
39.1437 45.9449 690 5.0592
38.7851 46.9449 705 5.0426
38.6504 47.9449 720 5.0363
38.1626 48.9449 735 5.0278
38.4464 49.9449 750 5.0176
37.8782 50.9449 765 5.0095
37.6813 51.9449 780 5.0009
37.6351 52.9449 795 4.9932
37.4911 53.9449 810 4.9910
37.1763 54.9449 825 4.9878
36.9238 55.9449 840 4.9850
36.9422 56.9449 855 4.9782
36.7798 57.9449 870 4.9746
36.726 58.9449 885 4.9732
36.5733 59.9449 900 4.9719
36.4625 60.9449 915 4.9713
36.136 61.9449 930 4.9738
36.0224 62.9449 945 4.9767
35.8764 63.9449 960 4.9705
35.6802 64.9449 975 4.9749
35.6014 65.9449 990 4.9774
35.3747 66.9449 1005 4.9793
35.4135 67.9449 1020 4.9885
35.1404 68.9449 1035 4.9830
34.8309 69.9449 1050 4.9920
34.785 70.9449 1065 4.9935
34.3068 71.9449 1080 4.9965
34.4988 72.9449 1095 5.0031
34.2062 73.9449 1110 5.0086
34.045 74.9449 1125 5.0116
33.8394 75.9449 1140 5.0191
33.9087 76.9449 1155 5.0220
33.6995 77.9449 1170 5.0358
33.512 78.9449 1185 5.0355
33.2894 79.9449 1200 5.0434
33.4304 80.9449 1215 5.0478
33.0774 81.9449 1230 5.0598
32.9423 82.9449 1245 5.0638
32.608 83.9449 1260 5.0722
32.6954 84.9449 1275 5.0839
32.3901 85.9449 1290 5.0859
32.4082 86.9449 1305 5.0956
32.1909 87.9449 1320 5.1047
32.0841 88.9449 1335 5.1168
32.0165 89.9449 1350 5.1258
32.0558 90.9449 1365 5.1325
31.656 91.9449 1380 5.1392
31.6265 92.9449 1395 5.1544
31.4545 93.9449 1410 5.1621
31.1955 94.9449 1425 5.1681
31.0298 95.9449 1440 5.1818
31.2093 96.9449 1455 5.1897
30.9217 97.9449 1470 5.1986
30.8671 98.9449 1485 5.2092
30.7616 99.9449 1500 5.2205
30.531 100.9449 1515 5.2250
30.4142 101.9449 1530 5.2317
30.3193 102.9449 1545 5.2458
30.1596 103.9449 1560 5.2517
30.1089 104.9449 1575 5.2605
30.0351 105.9449 1590 5.2727
29.8713 106.9449 1605 5.2791
29.8566 107.9449 1620 5.2899
29.7063 108.9449 1635 5.2969
29.5488 109.9449 1650 5.3043
29.4285 110.9449 1665 5.3145
29.1821 111.9449 1680 5.3240
29.3884 112.9449 1695 5.3328
29.1438 113.9449 1710 5.3431
29.0321 114.9449 1725 5.3542
28.998 115.9449 1740 5.3580
28.9714 116.9449 1755 5.3698
28.8438 117.9449 1770 5.3755
28.6202 118.9449 1785 5.3846
28.6363 119.9449 1800 5.3947
28.6094 120.9449 1815 5.4002
28.3833 121.9449 1830 5.4084
28.4819 122.9449 1845 5.4212
28.418 123.9449 1860 5.4274
28.2616 124.9449 1875 5.4369
28.0444 125.9449 1890 5.4412
28.0322 126.9449 1905 5.4545
28.0176 127.9449 1920 5.4559
27.8175 128.9449 1935 5.4680
27.7974 129.9449 1950 5.4714
27.815 130.9449 1965 5.4778
27.7268 131.9449 1980 5.4866
27.595 132.9449 1995 5.4903
27.4416 133.9449 2010 5.5030
27.3969 134.9449 2025 5.5100
27.3858 135.9449 2040 5.5189
27.0041 136.9449 2055 5.5187
27.2507 137.9449 2070 5.5258
27.0918 138.9449 2085 5.5368
27.098 139.9449 2100 5.5404
26.7923 140.9449 2115 5.5479
26.9444 141.9449 2130 5.5520
26.9558 142.9449 2145 5.5585
26.8615 143.9449 2160 5.5629
26.8124 144.9449 2175 5.5693
26.8456 145.9449 2190 5.5758
26.5996 146.9449 2205 5.5804
26.7836 147.9449 2220 5.5818
26.5417 148.9449 2235 5.5920
26.5924 149.9449 2250 5.5945
26.5649 150.9449 2265 5.6005
26.4758 151.9449 2280 5.6091
26.2602 152.9449 2295 5.6084
26.2638 153.9449 2310 5.6149
26.2578 154.9449 2325 5.6211
26.3442 155.9449 2340 5.6221
26.0737 156.9449 2355 5.6257
26.1811 157.9449 2370 5.6315
26.2782 158.9449 2385 5.6358
26.1001 159.9449 2400 5.6376
26.0365 160.9449 2415 5.6427
26.0782 161.9449 2430 5.6443
26.1369 162.9449 2445 5.6499
26.0939 163.9449 2460 5.6521
25.9941 164.9449 2475 5.6562
26.049 165.9449 2490 5.6600
25.784 166.9449 2505 5.6608
25.9461 167.9449 2520 5.6622
25.8137 168.9449 2535 5.6665
25.8926 169.9449 2550 5.6685
25.7181 170.9449 2565 5.6709
25.7943 171.9449 2580 5.6730
25.6186 172.9449 2595 5.6737
25.9311 173.9449 2610 5.6773
25.7088 174.9449 2625 5.6787
25.6958 175.9449 2640 5.6804
25.6956 176.9449 2655 5.6818
25.7138 177.9449 2670 5.6831
25.6855 178.9449 2685 5.6839
25.6264 179.9449 2700 5.6860
25.6692 180.9449 2715 5.6860
25.6203 181.9449 2730 5.6873
25.6981 182.9449 2745 5.6894
25.6047 183.9449 2760 5.6907
25.6367 184.9449 2775 5.6902
25.5185 185.9449 2790 5.6920
25.4989 186.9449 2805 5.6921
25.6538 187.9449 2820 5.6925
25.6601 188.9449 2835 5.6932
25.4707 189.9449 2850 5.6938
25.4421 190.9449 2865 5.6940
25.4802 191.9449 2880 5.6939
25.5282 192.9449 2895 5.6939
25.551 193.9449 2910 5.6949
25.5775 194.9449 2925 5.6945
25.5204 195.9449 2940 5.6946
25.6957 196.9449 2955 5.6946
25.6447 197.9449 2970 5.6946
25.5091 198.9449 2985 5.6947
25.4451 199.9449 3000 5.6947

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-german-mirror-reversal