End of training

de615c6 verified over 1 year ago

5.78 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: mistralit2_1000_STEPS_1e7_rate_03_beta_DPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistralit2_1000_STEPS_1e7_rate_03_beta_DPO

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6191
	- Rewards/chosen: -1.8431
	- Rewards/rejected: -2.7054
	- Rewards/accuracies: 0.6505
	- Rewards/margins: 0.8623
	- Logps/rejected: -37.5904
	- Logps/chosen: -29.5295
	- Logits/rejected: -2.8238
	- Logits/chosen: -2.8242

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-07
	- train_batch_size: 4
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6777 \| 0.1 \| 50 \| 0.6740 \| -0.1496 \| -0.1942 \| 0.5824 \| 0.0446 \| -29.2197 \| -23.8845 \| -2.8632 \| -2.8635 \|
	\| 0.6077 \| 0.2 \| 100 \| 0.6364 \| -1.2703 \| -1.6253 \| 0.5846 \| 0.3550 \| -33.9902 \| -27.6202 \| -2.8384 \| -2.8387 \|
	\| 0.4959 \| 0.29 \| 150 \| 0.6488 \| -2.0038 \| -2.5512 \| 0.5934 \| 0.5473 \| -37.0763 \| -30.0653 \| -2.8343 \| -2.8347 \|
	\| 0.553 \| 0.39 \| 200 \| 0.5977 \| -0.9571 \| -1.3986 \| 0.6374 \| 0.4415 \| -33.2344 \| -26.5762 \| -2.8518 \| -2.8521 \|
	\| 0.6334 \| 0.49 \| 250 \| 0.5740 \| -0.6757 \| -1.1710 \| 0.6440 \| 0.4953 \| -32.4758 \| -25.6382 \| -2.8479 \| -2.8482 \|
	\| 0.5613 \| 0.59 \| 300 \| 0.5961 \| -1.4901 \| -2.1568 \| 0.6374 \| 0.6666 \| -35.7616 \| -28.3529 \| -2.8436 \| -2.8439 \|
	\| 0.5182 \| 0.68 \| 350 \| 0.6175 \| -1.8099 \| -2.5639 \| 0.6418 \| 0.7541 \| -37.1189 \| -29.4187 \| -2.8403 \| -2.8407 \|
	\| 0.6292 \| 0.78 \| 400 \| 0.6197 \| -1.8949 \| -2.6751 \| 0.6418 \| 0.7802 \| -37.4896 \| -29.7022 \| -2.8352 \| -2.8356 \|
	\| 0.6529 \| 0.88 \| 450 \| 0.5986 \| -1.3908 \| -2.0689 \| 0.6527 \| 0.6781 \| -35.4687 \| -28.0218 \| -2.8394 \| -2.8398 \|
	\| 0.5042 \| 0.98 \| 500 \| 0.5930 \| -1.2223 \| -1.8903 \| 0.6637 \| 0.6680 \| -34.8735 \| -27.4602 \| -2.8391 \| -2.8395 \|
	\| 0.364 \| 1.07 \| 550 \| 0.5917 \| -1.3579 \| -2.0905 \| 0.6659 \| 0.7327 \| -35.5409 \| -27.9120 \| -2.8340 \| -2.8344 \|
	\| 0.346 \| 1.17 \| 600 \| 0.6084 \| -1.6411 \| -2.4313 \| 0.6527 \| 0.7903 \| -36.6769 \| -28.8561 \| -2.8286 \| -2.8291 \|
	\| 0.4524 \| 1.27 \| 650 \| 0.6120 \| -1.7303 \| -2.5496 \| 0.6484 \| 0.8192 \| -37.0710 \| -29.1536 \| -2.8265 \| -2.8269 \|
	\| 0.3422 \| 1.37 \| 700 \| 0.6172 \| -1.7895 \| -2.6271 \| 0.6505 \| 0.8376 \| -37.3293 \| -29.3507 \| -2.8252 \| -2.8257 \|
	\| 0.2776 \| 1.46 \| 750 \| 0.6164 \| -1.8100 \| -2.6641 \| 0.6462 \| 0.8541 \| -37.4528 \| -29.4193 \| -2.8245 \| -2.8249 \|
	\| 0.3599 \| 1.56 \| 800 \| 0.6201 \| -1.8360 \| -2.6887 \| 0.6484 \| 0.8527 \| -37.5348 \| -29.5057 \| -2.8241 \| -2.8246 \|
	\| 0.4059 \| 1.66 \| 850 \| 0.6205 \| -1.8421 \| -2.6971 \| 0.6440 \| 0.8550 \| -37.5629 \| -29.5263 \| -2.8241 \| -2.8246 \|
	\| 0.3417 \| 1.76 \| 900 \| 0.6190 \| -1.8389 \| -2.6983 \| 0.6505 \| 0.8594 \| -37.5666 \| -29.5155 \| -2.8239 \| -2.8243 \|
	\| 0.3409 \| 1.86 \| 950 \| 0.6195 \| -1.8423 \| -2.7030 \| 0.6484 \| 0.8606 \| -37.5823 \| -29.5270 \| -2.8237 \| -2.8242 \|
	\| 0.2802 \| 1.95 \| 1000 \| 0.6191 \| -1.8431 \| -2.7054 \| 0.6505 \| 0.8623 \| -37.5904 \| -29.5295 \| -2.8238 \| -2.8242 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.0.0+cu117
	- Datasets 2.18.0
	- Tokenizers 0.15.2