de_childes_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.1897

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.5021 2000 7.0336
6.9417 3.0041 4000 5.8112
6.9417 4.5062 6000 5.4586
5.2077 6.0083 8000 5.1717
5.2077 7.5103 10000 4.9583
4.7315 9.0124 12000 4.8002
4.7315 10.5145 14000 4.6706
4.4253 12.0165 16000 4.5552
4.4253 13.5186 18000 4.4522
4.1962 15.0207 20000 4.3739
4.1962 16.5227 22000 4.2925
4.0105 18.0248 24000 4.2277
4.0105 19.5268 26000 4.1706
3.857 21.0289 28000 4.1245
3.857 22.5310 30000 4.0846
3.7298 24.0330 32000 4.0589
3.7298 25.5351 34000 4.0266
3.6218 27.0372 36000 4.0028
3.6218 28.5392 38000 3.9863
3.5278 30.0413 40000 3.9732
3.5278 31.5434 42000 3.9720
3.4351 33.0454 44000 3.9599
3.4351 34.5475 46000 3.9566
3.3444 36.0496 48000 3.9572
3.3444 37.5516 50000 3.9680
3.2651 39.0537 52000 3.9788
3.2651 40.5558 54000 3.9815
3.1966 42.0578 56000 3.9928
3.1966 43.5599 58000 4.0061
3.1344 45.0620 60000 4.0126
3.1344 46.5640 62000 4.0198
3.0785 48.0661 64000 4.0377
3.0785 49.5682 66000 4.0502
3.0287 51.0702 68000 4.0644
3.0287 52.5723 70000 4.0714
2.9837 54.0744 72000 4.0852
2.9837 55.5764 74000 4.0964
2.9422 57.0785 76000 4.1148
2.9422 58.5805 78000 4.1221
2.9052 60.0826 80000 4.1276
2.9052 61.5847 82000 4.1346
2.8708 63.0867 84000 4.1505
2.8708 64.5888 86000 4.1574
2.839 66.0909 88000 4.1675
2.839 67.5929 90000 4.1727
2.8117 69.0950 92000 4.1767
2.8117 70.5971 94000 4.1823
2.7886 72.0991 96000 4.1867
2.7886 73.6012 98000 4.1872
2.768 75.1033 100000 4.1897

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
9
Safetensors
Model size
12.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support