impossible-llms-english-fronting-bigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.4268

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
22.3579 1.0 87 7.3625
17.7285 2.0 174 5.9282
17.3131 3.0 261 5.7417
16.9484 4.0 348 5.5702
16.2015 5.0 435 5.3600
15.635 6.0 522 5.1832
15.2242 7.0 609 5.0535
14.9803 8.0 696 4.9441
14.693 9.0 783 4.8592
14.4182 10.0 870 4.7920
14.3186 11.0 957 4.7325
14.0921 12.0 1044 4.6868
13.8969 13.0 1131 4.6437
13.8353 14.0 1218 4.6098
13.6798 15.0 1305 4.5795
13.637 16.0 1392 4.5563
13.5227 17.0 1479 4.5350
13.4718 18.0 1566 4.5154
13.2136 19.0 1653 4.4986
13.3515 20.0 1740 4.4878
13.2931 21.0 1827 4.4752
13.1062 22.0 1914 4.4651
13.1325 23.0 2001 4.4568
13.0963 24.0 2088 4.4508
13.1318 25.0 2175 4.4443
12.8938 26.0 2262 4.4397
12.935 27.0 2349 4.4364
13.1248 28.0 2436 4.4331
12.9068 29.0 2523 4.4304
12.8866 30.0 2610 4.4293
12.9587 31.0 2697 4.4282
12.8039 32.0 2784 4.4273
12.7212 33.0 2871 4.4270
12.8857 34.0 2958 4.4268
34.5151 34.4863 3000 4.4268

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-english-fronting-bigram