train_wsc_1752870510

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.5119
Num Input Tokens Seen: 490000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
10.8483	0.504	63	10.6926	25504
8.5842	1.008	126	8.5112	49696
5.6943	1.512	189	5.6082	74112
3.3328	2.016	252	3.6076	99136
2.3983	2.52	315	2.3585	123904
1.6714	3.024	378	1.5518	148736
1.1345	3.528	441	1.0608	174432
0.7365	4.032	504	0.8147	198656
0.6532	4.536	567	0.7061	224032
0.586	5.04	630	0.6429	247424
0.5393	5.5440	693	0.6077	271232
0.5291	6.048	756	0.5702	295728
0.543	6.552	819	0.5501	320464
0.5172	7.056	882	0.5288	345856
0.5124	7.5600	945	0.5279	371040
0.496	8.064	1008	0.5196	395216
0.4845	8.568	1071	0.5164	419184
0.4689	9.072	1134	0.5162	444560
0.4882	9.576	1197	0.5119	469104

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.7.1+cu126
Datasets 3.6.0
Tokenizers 0.21.1

rbelanec
/

train_wsc_1752870510

train_wsc_1752870510

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wsc_1752870510

Evaluation results