Llama-3.1-8B-Instruct-SAA-500

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_500 dataset. It achieves the following results on the evaluation set:

Loss: 0.1156
Rewards/chosen: -0.0083
Rewards/rejected: -0.0476
Rewards/accuracies: 0.8400
Rewards/margins: 0.0394
Logps/rejected: -0.4763
Logps/chosen: -0.0828
Logits/rejected: -0.4102
Logits/chosen: -0.3506
Sft Loss: 0.0123
Odds Ratio Loss: 1.0330

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Sft Loss	Odds Ratio Loss
1.1636	1.7778	50	1.0705	-0.1023	-0.1440	0.8600	0.0417	-1.4401	-1.0227	-0.4901	-0.4152	0.1264	9.4410
0.2533	3.5556	100	0.1969	-0.0162	-0.0598	0.8200	0.0436	-0.5976	-0.1621	-0.4620	-0.3943	0.0207	1.7615
0.123	5.3333	150	0.1220	-0.0088	-0.0489	0.8600	0.0401	-0.4891	-0.0879	-0.4159	-0.3564	0.0128	1.0917
0.1442	7.1111	200	0.1198	-0.0086	-0.0468	0.8200	0.0382	-0.4680	-0.0861	-0.4111	-0.3521	0.0128	1.0702
0.1387	8.8889	250	0.1156	-0.0083	-0.0476	0.8400	0.0394	-0.4763	-0.0828	-0.4102	-0.3506	0.0123	1.0330

Framework versions

PEFT 0.12.0
Transformers 4.45.2
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.20.0

chchen
/

Llama-3.1-8B-Instruct-SAA-500

Llama-3.1-8B-Instruct-SAA-500

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-500

Evaluation results