impossible-llms-english-random-fourgram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.4904

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
21.2318 1.0 96 7.0318
17.3066 2.0 192 5.8019
16.9078 3.0 288 5.6130
16.3046 4.0 384 5.3811
15.7736 5.0 480 5.1867
15.2597 6.0 576 5.0483
14.9974 7.0 672 4.9447
14.6845 8.0 768 4.8589
14.5313 9.0 864 4.7952
14.425 10.0 960 4.7419
14.09 11.0 1056 4.6954
13.959 12.0 1152 4.6586
13.9513 13.0 1248 4.6308
13.7675 14.0 1344 4.6051
13.6601 15.0 1440 4.5844
13.5687 16.0 1536 4.5667
13.5257 17.0 1632 4.5534
13.4789 18.0 1728 4.5398
13.4417 19.0 1824 4.5290
13.3908 20.0 1920 4.5210
13.307 21.0 2016 4.5132
13.3016 22.0 2112 4.5081
13.2893 23.0 2208 4.5023
13.2032 24.0 2304 4.4990
13.1012 25.0 2400 4.4962
13.1562 26.0 2496 4.4939
12.9843 27.0 2592 4.4923
13.0885 28.0 2688 4.4913
13.0813 29.0 2784 4.4908
13.1086 30.0 2880 4.4904
13.1765 31.0 2976 4.4904
34.9023 31.2516 3000 4.4904

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-english-random-fourgram