impossible-llms-spanish-random-trigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.7066

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
82.9733 1.0 8 10.1380
75.8627 2.0 16 9.3943
72.9515 3.0 24 9.0643
71.6032 4.0 32 8.9186
70.43 5.0 40 8.7530
68.9775 6.0 48 8.5889
67.4266 7.0 56 8.4027
65.7647 8.0 64 8.1834
63.9125 9.0 72 7.9671
62.2846 10.0 80 7.7450
60.585 11.0 88 7.5208
58.6086 12.0 96 7.3007
57.0585 13.0 104 7.0879
55.3789 14.0 112 6.8909
53.8711 15.0 120 6.7159
52.8598 16.0 128 6.5712
51.8931 17.0 136 6.4712
51.0249 18.0 144 6.3874
50.7649 19.0 152 6.3245
49.9938 20.0 160 6.2685
49.6 21.0 168 6.2156
49.3546 22.0 176 6.1711
49.0595 23.0 184 6.1315
48.5708 24.0 192 6.1018
48.2928 25.0 200 6.0545
47.994 26.0 208 6.0289
47.4027 27.0 216 5.9954
47.4567 28.0 224 5.9669
47.1224 29.0 232 5.9427
46.9763 30.0 240 5.9195
46.8013 31.0 248 5.8970
46.4533 32.0 256 5.8863
46.258 33.0 264 5.8621
46.0531 34.0 272 5.8469
45.8693 35.0 280 5.8301
45.7775 36.0 288 5.8110
45.4574 37.0 296 5.7973
45.384 38.0 304 5.7860
45.14 39.0 312 5.7715
44.9259 40.0 320 5.7627
44.8364 41.0 328 5.7461
44.5589 42.0 336 5.7299
44.2769 43.0 344 5.7123
44.2618 44.0 352 5.7029
44.1485 45.0 360 5.6984
43.8223 46.0 368 5.6889
43.6464 47.0 376 5.6691
43.5192 48.0 384 5.6708
43.3381 49.0 392 5.6586
43.1446 50.0 400 5.6500
42.92 51.0 408 5.6444
42.7591 52.0 416 5.6324
42.5581 53.0 424 5.6303
42.2034 54.0 432 5.6273
42.298 55.0 440 5.6196
42.0519 56.0 448 5.6137
41.7271 57.0 456 5.6116
41.475 58.0 464 5.6048
41.4602 59.0 472 5.6002
41.3604 60.0 480 5.6077
41.221 61.0 488 5.5998
41.0521 62.0 496 5.6056
40.794 63.0 504 5.6033
40.5781 64.0 512 5.6023
40.5309 65.0 520 5.6056
40.304 66.0 528 5.6056
39.9717 67.0 536 5.6076
39.7713 68.0 544 5.6163
39.7593 69.0 552 5.6194
39.5544 70.0 560 5.6221
39.4424 71.0 568 5.6286
39.1879 72.0 576 5.6344
38.9627 73.0 584 5.6377
38.7648 74.0 592 5.6504
38.6023 75.0 600 5.6627
38.4814 76.0 608 5.6589
38.3261 77.0 616 5.6667
38.0804 78.0 624 5.6782
37.8844 79.0 632 5.6908
37.6588 80.0 640 5.7084
37.4768 81.0 648 5.7185
37.3375 82.0 656 5.7297
37.031 83.0 664 5.7300
36.7817 84.0 672 5.7480
36.8966 85.0 680 5.7470
36.5525 86.0 688 5.7592
36.478 87.0 696 5.7775
36.176 88.0 704 5.7889
35.9584 89.0 712 5.8079
35.8365 90.0 720 5.8175
35.6729 91.0 728 5.8303
35.4745 92.0 736 5.8376
35.3489 93.0 744 5.8577
35.1739 94.0 752 5.8575
34.9136 95.0 760 5.8791
34.7444 96.0 768 5.9020
34.5735 97.0 776 5.9112
34.3098 98.0 784 5.9218
34.3126 99.0 792 5.9277
34.1658 100.0 800 5.9521
33.9487 101.0 808 5.9522
33.6928 102.0 816 5.9800
33.5653 103.0 824 5.9805
33.4678 104.0 832 5.9987
33.1124 105.0 840 6.0220
33.0137 106.0 848 6.0320
32.8984 107.0 856 6.0514
32.8075 108.0 864 6.0507
32.5741 109.0 872 6.0704
32.4494 110.0 880 6.0746
32.1832 111.0 888 6.1016
32.0472 112.0 896 6.1236
32.0669 113.0 904 6.1315
31.7332 114.0 912 6.1548
31.6171 115.0 920 6.1677
31.2678 116.0 928 6.1799
31.4348 117.0 936 6.1924
31.0856 118.0 944 6.2054
30.8131 119.0 952 6.2164
30.8044 120.0 960 6.2283
30.653 121.0 968 6.2490
30.2781 122.0 976 6.2766
30.3348 123.0 984 6.2750
30.1585 124.0 992 6.2834
30.0486 125.0 1000 6.2987
29.8699 126.0 1008 6.3086
29.7011 127.0 1016 6.3194
29.6424 128.0 1024 6.3398
29.436 129.0 1032 6.3594
29.2558 130.0 1040 6.3784
29.2634 131.0 1048 6.3784
28.9683 132.0 1056 6.3980
28.9409 133.0 1064 6.4186
28.7912 134.0 1072 6.4259
28.7214 135.0 1080 6.4451
28.3626 136.0 1088 6.4447
28.303 137.0 1096 6.4764
28.0938 138.0 1104 6.4765
27.9736 139.0 1112 6.4947
27.9646 140.0 1120 6.5150
27.7595 141.0 1128 6.5219
27.6672 142.0 1136 6.5349
27.5115 143.0 1144 6.5418
27.3989 144.0 1152 6.5525
27.2322 145.0 1160 6.5750
27.1536 146.0 1168 6.5788
26.918 147.0 1176 6.6096
26.93 148.0 1184 6.6063
26.8101 149.0 1192 6.6146
26.6292 150.0 1200 6.6443
26.5597 151.0 1208 6.6378
26.3258 152.0 1216 6.6476
26.3014 153.0 1224 6.6655
26.2341 154.0 1232 6.6871
26.0496 155.0 1240 6.7034
25.8795 156.0 1248 6.7092
25.8079 157.0 1256 6.7204
25.6752 158.0 1264 6.7203
25.521 159.0 1272 6.7521
25.4696 160.0 1280 6.7610
25.3268 161.0 1288 6.7741
25.1959 162.0 1296 6.7791
25.2836 163.0 1304 6.7898
25.1064 164.0 1312 6.8028
25.0024 165.0 1320 6.8166
24.9199 166.0 1328 6.8173
24.6732 167.0 1336 6.8342
24.6356 168.0 1344 6.8470
24.5186 169.0 1352 6.8576
24.4474 170.0 1360 6.8696
24.2977 171.0 1368 6.8838
24.2159 172.0 1376 6.8885
24.0174 173.0 1384 6.8984
24.0456 174.0 1392 6.9146
23.9057 175.0 1400 6.9252
23.836 176.0 1408 6.9345
23.7763 177.0 1416 6.9459
23.6371 178.0 1424 6.9581
23.4964 179.0 1432 6.9567
23.4087 180.0 1440 6.9668
23.4787 181.0 1448 6.9878
23.2909 182.0 1456 7.0021
23.2037 183.0 1464 7.0044
23.1077 184.0 1472 7.0095
23.1223 185.0 1480 7.0237
23.0181 186.0 1488 7.0253
22.8319 187.0 1496 7.0433
22.6766 188.0 1504 7.0584
22.5745 189.0 1512 7.0734
22.6136 190.0 1520 7.0746
22.6386 191.0 1528 7.0769
22.3468 192.0 1536 7.0861
22.3324 193.0 1544 7.1038
22.3003 194.0 1552 7.1050
22.2035 195.0 1560 7.1135
22.2508 196.0 1568 7.1215
22.0929 197.0 1576 7.1224
21.8521 198.0 1584 7.1385
21.8427 199.0 1592 7.1534
21.8101 200.0 1600 7.1633
21.8117 201.0 1608 7.1737
21.7302 202.0 1616 7.1755
21.6858 203.0 1624 7.1771
21.5795 204.0 1632 7.1999
21.4033 205.0 1640 7.1998
21.3557 206.0 1648 7.2087
21.362 207.0 1656 7.2162
21.2887 208.0 1664 7.2278
21.2647 209.0 1672 7.2368
21.1294 210.0 1680 7.2437
21.0053 211.0 1688 7.2425
20.8938 212.0 1696 7.2527
20.8835 213.0 1704 7.2560
20.8416 214.0 1712 7.2649
20.9264 215.0 1720 7.2681
20.7855 216.0 1728 7.2841
20.7273 217.0 1736 7.2881
20.6209 218.0 1744 7.2950
20.6245 219.0 1752 7.2968
20.5751 220.0 1760 7.3022
20.4651 221.0 1768 7.3177
20.421 222.0 1776 7.3193
20.396 223.0 1784 7.3280
20.2115 224.0 1792 7.3324
20.2426 225.0 1800 7.3388
20.1705 226.0 1808 7.3447
20.0706 227.0 1816 7.3593
20.0526 228.0 1824 7.3586
19.9699 229.0 1832 7.3730
19.9693 230.0 1840 7.3714
19.8774 231.0 1848 7.3788
19.7865 232.0 1856 7.3833
19.8824 233.0 1864 7.3846
19.6696 234.0 1872 7.3839
19.6567 235.0 1880 7.4056
19.5757 236.0 1888 7.4100
19.6795 237.0 1896 7.4117
19.6425 238.0 1904 7.4162
19.5103 239.0 1912 7.4196
19.4417 240.0 1920 7.4329
19.4904 241.0 1928 7.4365
19.3839 242.0 1936 7.4469
19.375 243.0 1944 7.4422
19.2889 244.0 1952 7.4488
19.2558 245.0 1960 7.4567
19.2278 246.0 1968 7.4600
19.1181 247.0 1976 7.4692
19.0961 248.0 1984 7.4738
19.0715 249.0 1992 7.4740
19.0496 250.0 2000 7.4824
19.0083 251.0 2008 7.4835
18.9141 252.0 2016 7.4959
18.9221 253.0 2024 7.4904
18.9279 254.0 2032 7.5009
18.8651 255.0 2040 7.4973
18.7985 256.0 2048 7.5030
18.7127 257.0 2056 7.5099
18.773 258.0 2064 7.5165
18.6578 259.0 2072 7.5124
18.6339 260.0 2080 7.5202
18.6373 261.0 2088 7.5244
18.6085 262.0 2096 7.5311
18.5759 263.0 2104 7.5326
18.604 264.0 2112 7.5401
18.5656 265.0 2120 7.5444
18.4235 266.0 2128 7.5528
18.5318 267.0 2136 7.5496
18.4513 268.0 2144 7.5502
18.3522 269.0 2152 7.5572
18.3817 270.0 2160 7.5672
18.3282 271.0 2168 7.5695
18.3291 272.0 2176 7.5691
18.2408 273.0 2184 7.5721
18.2657 274.0 2192 7.5742
18.2421 275.0 2200 7.5738
18.1431 276.0 2208 7.5764
18.1671 277.0 2216 7.5875
18.1461 278.0 2224 7.5894
18.0641 279.0 2232 7.5881
18.0364 280.0 2240 7.5972
18.0405 281.0 2248 7.5952
18.1256 282.0 2256 7.5977
17.9734 283.0 2264 7.6080
17.9075 284.0 2272 7.6076
17.9187 285.0 2280 7.6069
17.9586 286.0 2288 7.6043
17.882 287.0 2296 7.6186
17.8672 288.0 2304 7.6214
17.8878 289.0 2312 7.6164
17.8852 290.0 2320 7.6237
17.8945 291.0 2328 7.6245
17.7953 292.0 2336 7.6297
17.7679 293.0 2344 7.6304
17.7104 294.0 2352 7.6286
17.7274 295.0 2360 7.6329
17.8378 296.0 2368 7.6313
17.7059 297.0 2376 7.6407
17.6694 298.0 2384 7.6439
17.6471 299.0 2392 7.6410
17.6261 300.0 2400 7.6451
17.6186 301.0 2408 7.6494
17.6226 302.0 2416 7.6481
17.5872 303.0 2424 7.6518
17.539 304.0 2432 7.6539
17.5774 305.0 2440 7.6555
17.5556 306.0 2448 7.6611
17.5351 307.0 2456 7.6561
17.4769 308.0 2464 7.6581
17.5185 309.0 2472 7.6641
17.4876 310.0 2480 7.6592
17.4532 311.0 2488 7.6637
17.4323 312.0 2496 7.6661
17.4594 313.0 2504 7.6679
17.4059 314.0 2512 7.6667
17.4346 315.0 2520 7.6785
17.4066 316.0 2528 7.6736
17.4681 317.0 2536 7.6786
17.432 318.0 2544 7.6754
17.3969 319.0 2552 7.6804
17.4143 320.0 2560 7.6799
17.3367 321.0 2568 7.6818
17.3314 322.0 2576 7.6851
17.3424 323.0 2584 7.6841
17.4023 324.0 2592 7.6839
17.2825 325.0 2600 7.6870
17.2865 326.0 2608 7.6857
17.3422 327.0 2616 7.6866
17.3048 328.0 2624 7.6901
17.2966 329.0 2632 7.6886
17.2027 330.0 2640 7.6900
17.31 331.0 2648 7.6911
17.376 332.0 2656 7.6930
17.2862 333.0 2664 7.6943
17.2696 334.0 2672 7.6960
17.2753 335.0 2680 7.6968
17.17 336.0 2688 7.6963
17.1251 337.0 2696 7.6971
17.2253 338.0 2704 7.6966
17.2259 339.0 2712 7.6996
17.1681 340.0 2720 7.6974
17.2241 341.0 2728 7.6990
17.2255 342.0 2736 7.6996
17.2979 343.0 2744 7.7002
17.1608 344.0 2752 7.6985
17.1699 345.0 2760 7.7017
17.225 346.0 2768 7.7016
17.1287 347.0 2776 7.7021
17.2054 348.0 2784 7.7034
17.2621 349.0 2792 7.7025
17.1999 350.0 2800 7.7041
17.1437 351.0 2808 7.7053
17.1945 352.0 2816 7.7044
17.1877 353.0 2824 7.7030
17.0871 354.0 2832 7.7059
17.1505 355.0 2840 7.7050
17.1964 356.0 2848 7.7052
17.1234 357.0 2856 7.7060
17.2103 358.0 2864 7.7056
17.2273 359.0 2872 7.7047
17.0804 360.0 2880 7.7061
17.0981 361.0 2888 7.7064
17.0298 362.0 2896 7.7056
17.1634 363.0 2904 7.7053
17.0846 364.0 2912 7.7061
17.1246 365.0 2920 7.7067
17.1628 366.0 2928 7.7064
17.2124 367.0 2936 7.7063
17.085 368.0 2944 7.7064
17.0525 369.0 2952 7.7065
17.0698 370.0 2960 7.7065
17.1785 371.0 2968 7.7066
17.0874 372.0 2976 7.7066
17.1582 373.0 2984 7.7066
17.1624 374.0 2992 7.7066
17.1268 375.0 3000 7.7066

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
4
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-spanish-random-trigram