--- library_name: transformers tags: - generated_from_trainer datasets: - arrow model-index: - name: results5 results: [] --- # results5 This model is a fine-tuned version of [](https://huggingface.co/) on the arrow dataset. It achieves the following results on the evaluation set: - Loss: 2.2934 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 128 - eval_batch_size: 128 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - total_train_batch_size: 256 - total_eval_batch_size: 256 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 10000 - training_steps: 55177 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 6.314 | 0.0906 | 5000 | 6.2932 | | 5.2109 | 0.1812 | 10000 | 4.9949 | | 3.2468 | 0.2718 | 15000 | 3.0871 | | 2.922 | 0.3625 | 20000 | 2.7691 | | 2.7663 | 0.4531 | 25000 | 2.6115 | | 2.6594 | 0.5437 | 30000 | 2.5044 | | 2.5777 | 0.6343 | 35000 | 2.4308 | | 2.5149 | 0.7249 | 40000 | 2.3741 | | 2.4536 | 0.8155 | 45000 | 2.3367 | | 2.4271 | 0.9062 | 50000 | 2.3103 | | 2.4554 | 0.9968 | 55000 | 2.2934 | ### Framework versions - Transformers 4.51.0 - Pytorch 2.6.0+cu124 - Datasets 3.3.2 - Tokenizers 0.21.0