impossible-llms-english-natural

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.9845

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
58.027 1.0 86 7.1861
44.3552 2.0 172 5.5613
42.3359 3.0 258 5.2823
40.688 4.0 344 5.0685
38.9604 5.0 430 4.8318
37.6082 6.0 516 4.6480
36.6465 7.0 602 4.5107
35.7031 8.0 688 4.4041
34.7611 9.0 774 4.3191
34.6912 10.0 860 4.2578
33.8642 11.0 946 4.2083
33.3426 12.0 1032 4.1692
33.1165 13.0 1118 4.1382
32.8416 14.0 1204 4.1110
32.4453 15.0 1290 4.0879
32.297 16.0 1376 4.0682
32.2745 17.0 1462 4.0541
31.8602 18.0 1548 4.0416
31.5979 19.0 1634 4.0296
31.7409 20.0 1720 4.0234
31.4115 21.0 1806 4.0143
31.3564 22.0 1892 4.0074
31.1016 23.0 1978 4.0023
30.8809 24.0 2064 3.9992
31.0388 25.0 2150 3.9948
30.9397 26.0 2236 3.9915
30.9424 27.0 2322 3.9893
30.9243 28.0 2408 3.9881
30.6877 29.0 2494 3.9873
30.5782 30.0 2580 3.9858
30.3729 31.0 2666 3.9851
30.5981 32.0 2752 3.9848
30.7725 33.0 2838 3.9844
30.5009 34.0 2924 3.9844
30.614 34.8837 3000 3.9845

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-english-natural