ZoninSh/openhermes-mistral-dpo-gpt

Browse files

Files changed (5) hide show

README.md +16 -15
adapter_config.json +6 -1
adapter_model.safetensors +2 -2
runs/Nov18_15-32-12_056656b68ebe/events.out.tfevents.1700321570.056656b68ebe.176.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -15,15 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6047
-- Rewards/chosen: -0.0917
-- Rewards/rejected: -4.9453
-- Rewards/accuracies: 0.5625
-- Rewards/margins: 4.8536
-- Logps/rejected: -466.4970
-- Logps/chosen: -173.4612
-- Logits/rejected: -1.5579
-- Logits/chosen: -1.5934
 ## Model description
@@ -49,18 +49,19 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- training_steps: 50
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.8027        | 0.01  | 10   | 0.8159          | 0.0214         | 0.4647           | 0.1875             | -0.4432         | -412.3975      | -172.3304    | -1.6414         | -1.6412       |
-| 1.1614        | 0.01  | 20   | 0.9902          | 0.1251         | 1.0701           | 0.3125             | -0.9450         | -406.3427      | -171.2931    | -1.6348         | -1.6337       |
-| 0.7986        | 0.01  | 30   | 0.6237          | 0.0767         | -0.6890          | 0.5625             | 0.7657          | -423.9343      | -171.7779    | -1.6135         | -1.6290       |
-| 7.3338        | 0.02  | 40   | 0.6306          | -0.1257        | -4.7694          | 0.5625             | 4.6437          | -464.7380      | -173.8018    | -1.5634         | -1.5933       |
-| 2.144         | 0.03  | 50   | 0.6047          | -0.0917        | -4.9453          | 0.5625             | 4.8536          | -466.4970      | -173.4612    | -1.5579         | -1.5934       |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.2500
+- Rewards/chosen: -1.0975
+- Rewards/rejected: -1.6306
+- Rewards/accuracies: 0.625
+- Rewards/margins: 0.5331
+- Logps/rejected: -307.3866
+- Logps/chosen: -331.8629
+- Logits/rejected: -2.4077
+- Logits/chosen: -2.3038
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- training_steps: 300
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 4.2921        | 0.03  | 50   | 9.8028          | -5.3862        | 0.1060           | 0.1875             | -5.4922         | -290.0201      | -374.7499    | -2.2861         | -2.1795       |
+| 9.75          | 0.05  | 100  | 8.8191          | -12.7493       | -8.6505          | 0.3125             | -4.0989         | -377.5849      | -448.3811    | -2.2836         | -2.2309       |
+| 3.2104        | 0.07  | 150  | 0.8915          | -3.5710        | -6.0350          | 0.375              | 2.4640          | -351.4305      | -356.5982    | -2.6543         | -2.5955       |
+| 2.655         | 0.1   | 200  | 0.3207          | -1.0209        | -4.6027          | 0.6875             | 3.5818          | -337.1074      | -331.0971    | -2.4341         | -2.3534       |
+| 4.8481        | 0.12  | 250  | 1.1311          | -0.8147        | -2.3072          | 0.625              | 1.4926          | -314.1525      | -329.0346    | -2.3257         | -2.2374       |
+| 3.1598        | 0.15  | 300  | 3.2500          | -1.0975        | -1.6306          | 0.625              | 0.5331          | -307.3866      | -331.8629    | -2.4077         | -2.3038       |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -17,7 +17,12 @@
   "revision": null,
   "target_modules": [
     "q_proj",
-    "v_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

   "revision": null,
   "target_modules": [
     "q_proj",
+    "gate_proj",
+    "v_proj",
+    "down_proj",
+    "k_proj",
+    "o_proj",
+    "up_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:27c863c0cb0c4e370c8bed63b8b7d4730fcaee7186769db06dde9ffd849b2dc8
-size 13648432

 version https://git-lfs.github.com/spec/v1
+oid sha256:86a7fed3a974dbcdb88d1e4ada81ee85076e9ff04a6f8e08dcf960ee07b71633
+size 83945296

runs/Nov18_15-32-12_056656b68ebe/events.out.tfevents.1700321570.056656b68ebe.176.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a7484c8de187593571c24ff14b4bf5479a5f7d6f885205fbd48e0a9c5d5296c
+size 14047

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d31571207586ff667fb56d4455c593cb794147063948e1b4dbb49b3da24bf221
 size 4155

 version https://git-lfs.github.com/spec/v1
+oid sha256:695023550747e11566513b0f3522c6ad468e4815fb86520c9add707c8f099aad
 size 4155