impossible-llms-english-mirror-reversal

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0171

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 12
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 384
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 3000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss
21.0651 1.0 93 6.9366
16.7721 2.0 186 5.6109
16.2007 3.0 279 5.3989
15.362 4.0 372 5.0672
14.777 5.0 465 4.8375
14.1948 6.0 558 4.6671
13.7877 7.0 651 4.5429
13.4306 8.0 744 4.4433
13.2006 9.0 837 4.3635
12.9023 10.0 930 4.2974
12.926 11.0 1023 4.2489
12.7253 12.0 1116 4.2058
12.592 13.0 1209 4.1736
12.3876 14.0 1302 4.1453
12.2837 15.0 1395 4.1236
12.1655 16.0 1488 4.1020
12.1549 17.0 1581 4.0871
12.0255 18.0 1674 4.0723
12.0603 19.0 1767 4.0624
11.9875 20.0 1860 4.0519
11.766 21.0 1953 4.0446
12.0245 22.0 2046 4.0389
11.9487 23.0 2139 4.0326
11.6863 24.0 2232 4.0286
11.731 25.0 2325 4.0251
11.7887 26.0 2418 4.0217
11.8313 27.0 2511 4.0198
11.5967 28.0 2604 4.0185
11.5744 29.0 2697 4.0179
11.4695 30.0 2790 4.0173
11.6968 31.0 2883 4.0170
11.6475 32.0 2976 4.0171
30.9823 32.2598 3000 4.0171

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.4.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including IParraMartin/impossible-llms-english-mirror-reversal