impossible-llms-english-fronting-n

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.3357

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
22.1846 1.0 87 7.3020
17.3675 2.0 174 5.8002
16.8631 3.0 261 5.5977
16.4538 4.0 348 5.4003
15.6803 5.0 435 5.1926
15.1404 6.0 522 5.0324
14.7478 7.0 609 4.9024
14.5035 8.0 696 4.7975
14.235 9.0 783 4.7188
13.9852 10.0 870 4.6540
13.8877 11.0 957 4.5981
13.6711 12.0 1044 4.5553
13.5173 13.0 1131 4.5190
13.4679 14.0 1218 4.4875
13.2962 15.0 1305 4.4638
13.2535 16.0 1392 4.4402
13.1309 17.0 1479 4.4224
13.0784 18.0 1566 4.4073
12.8672 19.0 1653 4.3936
12.9824 20.0 1740 4.3828
12.9498 21.0 1827 4.3739
12.7717 22.0 1914 4.3650
12.8094 23.0 2001 4.3585
12.7595 24.0 2088 4.3541
12.7992 25.0 2175 4.3483
12.5838 26.0 2262 4.3455
12.6073 27.0 2349 4.3425
12.7835 28.0 2436 4.3396
12.5611 29.0 2523 4.3382
12.5633 30.0 2610 4.3376
12.645 31.0 2697 4.3367
12.4761 32.0 2784 4.3362
12.4169 33.0 2871 4.3359
12.5719 34.0 2958 4.3357
33.6696 34.4863 3000 4.3357

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-english-fronting-n