Llama-2-7b-spin-rephrased-10k

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.1687	0.1984	62	0.1554	5.2053	-5.2548	1.0	10.4601	-335.2168	-154.8048	-0.7218	-0.4019
0.1204	0.3968	124	0.1153	9.3697	-4.5235	1.0	13.8932	-327.9041	-113.1613	-0.8262	-1.1627
0.1114	0.5952	186	0.1125	9.6740	-5.3166	1.0	14.9906	-335.8354	-110.1185	-0.8446	-1.2393
0.1094	0.7936	248	0.1110	9.8335	-5.4853	1.0	15.3188	-337.5219	-108.5231	-0.8538	-1.2560
0.1115	0.992	310	0.1100	9.9127	-6.4827	1.0	16.3954	-347.4966	-107.7317	-0.8658	-1.3304
0.1046	1.1904	372	0.1093	9.9819	-6.6707	1.0	16.6526	-349.3765	-107.0395	-0.8656	-1.3633
0.1067	1.3888	434	0.1089	10.0127	-7.5740	1.0	17.5868	-358.4094	-106.7308	-0.8814	-1.3898
0.1038	1.5872	496	0.1083	10.0730	-7.0038	1.0	17.0768	-352.7069	-106.1281	-0.8755	-1.3615
0.0996	1.7856	558	0.1079	10.1219	-7.0176	1.0	17.1396	-352.8456	-105.6391	-0.8467	-1.3431
0.1058	1.984	620	0.1077	10.1479	-7.4808	1.0	17.6287	-357.4770	-105.3797	-0.8821	-1.4055
0.0995	2.1824	682	0.1074	10.1669	-7.1947	1.0	17.3617	-354.6166	-105.1890	-0.8781	-1.4102
0.1017	2.3808	744	0.1073	10.1849	-7.6243	1.0	17.8092	-358.9117	-105.0093	-0.8806	-1.4228
0.1031	2.5792	806	0.1072	10.2106	-7.6581	1.0	17.8687	-359.2500	-104.7519	-0.8787	-1.4391
0.1025	2.7776	868	0.1071	10.2105	-7.6804	1.0	17.8909	-359.4730	-104.7534	-0.8824	-1.4506
0.1067	2.976	930	0.1071	10.2171	-7.6243	1.0	17.8413	-358.9117	-104.6875	-0.8781	-1.4494