adhi29/openhermes-mistral-dpo-gptq

Browse files

Files changed (5) hide show

README.md +14 -15
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
runs/Jan09_03-58-46_e1dc5e887cb9/events.out.tfevents.1704772817.e1dc5e887cb9.264.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6522
-- Rewards/chosen: 0.0746
-- Rewards/rejected: 0.0188
-- Rewards/accuracies: 0.5625
-- Rewards/margins: 0.0558
-- Logps/rejected: -85.2534
-- Logps/chosen: -70.1958
-- Logits/rejected: -2.6062
-- Logits/chosen: -2.5855
 ## Model description
@@ -51,18 +51,17 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- training_steps: 50
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6707        | 0.01  | 10   | 0.6784          | 0.0243         | 0.0096           | 0.375              | 0.0147          | -85.3454       | -70.6989     | -2.6043         | -2.5836       |
-| 0.6955        | 0.01  | 20   | 0.6748          | 0.0239         | 0.0210           | 0.3125             | 0.0029          | -85.2314       | -70.7031     | -2.6048         | -2.5849       |
-| 0.7423        | 0.01  | 30   | 0.6740          | 0.0542         | 0.0384           | 0.5625             | 0.0158          | -85.0572       | -70.4001     | -2.6053         | -2.5858       |
-| 0.7023        | 0.02  | 40   | 0.6620          | 0.0731         | 0.0323           | 0.5625             | 0.0407          | -85.1185       | -70.2116     | -2.6061         | -2.5863       |
-| 0.685         | 0.03  | 50   | 0.6522          | 0.0746         | 0.0188           | 0.5625             | 0.0558          | -85.2534       | -70.1958     | -2.6062         | -2.5855       |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.9866
+- Rewards/chosen: 0.2715
+- Rewards/rejected: 0.5084
+- Rewards/accuracies: 0.625
+- Rewards/margins: -0.2369
+- Logps/rejected: -217.0748
+- Logps/chosen: -192.7873
+- Logits/rejected: -2.1497
+- Logits/chosen: -2.0212
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- training_steps: 40
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6752        | 0.01  | 10   | 0.7338          | 0.0443         | 0.0693           | 0.875              | -0.0250         | -221.4665      | -195.0593    | -2.1454         | -2.0106       |
+| 0.71          | 0.01  | 20   | 0.7099          | 0.0825         | 0.0676           | 0.875              | 0.0149          | -221.4828      | -194.6768    | -2.1435         | -2.0127       |
+| 0.6938        | 0.01  | 30   | 0.8421          | 0.1926         | 0.3222           | 0.625              | -0.1296         | -218.9368      | -193.5758    | -2.1482         | -2.0177       |
+| 0.6923        | 0.02  | 40   | 0.9866          | 0.2715         | 0.5084           | 0.625              | -0.2369         | -217.0748      | -192.7873    | -2.1497         | -2.0212       |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -19,8 +19,8 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "q_proj",
-    "v_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "v_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ed2bd803830f05aed17991050c376ef747a8829b0d969d4141a8810e47adaa2b
 size 13648432

 version https://git-lfs.github.com/spec/v1
+oid sha256:c98da1863b9dadf52f7f528d729e4c15130e1da81c318696bd8b3a409507ab5b
 size 13648432

runs/Jan09_03-58-46_e1dc5e887cb9/events.out.tfevents.1704772817.e1dc5e887cb9.264.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2acafb1886624c72c035a78bb9c445d90a30c3e9e02c835d76e23f8eea7c2ab6
+size 11244

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:accf2b53d9c6289fc381d015c0585ad7bcbb44c69f6389d73604727ffdd55f6c
 size 4155

 version https://git-lfs.github.com/spec/v1
+oid sha256:1b99c58734a5e5e9083464ec5d565517cbaf9a7e4545a53ff8e9a45bab558ed9
 size 4155