chchen
/

Llama-3.1-8B-Instruct-dpo-llama-1000

+---
+library_name: peft
+license: llama3.1
+base_model: meta-llama/Llama-3.1-8B-Instruct
+tags:
+- trl
+- dpo
+- llama-factory
+- generated_from_trainer
+model-index:
+- name: Llama-3.1-8B-Instruct-dpo-llama-1000
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Llama-3.1-8B-Instruct-dpo-llama-1000
+This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.3613
+- Rewards/chosen: 1.3392
+- Rewards/rejected: -1.7432
+- Rewards/accuracies: 0.8400
+- Rewards/margins: 3.0824
+- Logps/chosen: -9.1017
+- Logps/rejected: -41.8256
+- Logits/chosen: -0.1378
+- Logits/rejected: -0.2410
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 10.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:------------:|:--------------:|:-------------:|:---------------:|
+| 0.6815        | 0.8889 | 50   | 0.6707          | 0.0833         | 0.0353           | 0.6900             | 0.0480          | -21.6601     | -24.0398       | -0.4114       | -0.4792         |
+| 0.5082        | 1.7778 | 100  | 0.4428          | 1.0308         | 0.1943           | 0.7900             | 0.8366          | -12.1855     | -22.4506       | -0.3559       | -0.4377         |
+| 0.2979        | 2.6667 | 150  | 0.3215          | 1.3481         | -0.4170          | 0.8600             | 1.7651          | -9.0131      | -28.5637       | -0.2695       | -0.3655         |
+| 0.2862        | 3.5556 | 200  | 0.3077          | 1.4814         | -0.7600          | 0.8500             | 2.2414          | -7.6796      | -31.9936       | -0.2154       | -0.3106         |
+| 0.2747        | 4.4444 | 250  | 0.3184          | 1.4147         | -1.2445          | 0.8600             | 2.6592          | -8.3466      | -36.8385       | -0.1872       | -0.2879         |
+| 0.2688        | 5.3333 | 300  | 0.3195          | 1.4469         | -1.2794          | 0.8500             | 2.7263          | -8.0242      | -37.1874       | -0.1714       | -0.2705         |
+| 0.2047        | 6.2222 | 350  | 0.3630          | 1.3019         | -1.5956          | 0.8400             | 2.8975          | -9.4749      | -40.3495       | -0.1553       | -0.2578         |
+| 0.2268        | 7.1111 | 400  | 0.3526          | 1.3609         | -1.6635          | 0.8500             | 3.0245          | -8.8842      | -41.0287       | -0.1452       | -0.2479         |
+| 0.144         | 8.0    | 450  | 0.3662          | 1.3488         | -1.7032          | 0.8400             | 3.0520          | -9.0059      | -41.4255       | -0.1421       | -0.2448         |
+| 0.171         | 8.8889 | 500  | 0.3635          | 1.3313         | -1.7326          | 0.8400             | 3.0640          | -9.1805      | -41.7197       | -0.1399       | -0.2430         |
+| 0.2313        | 9.7778 | 550  | 0.3613          | 1.3392         | -1.7432          | 0.8400             | 3.0824          | -9.1017      | -41.8256       | -0.1378       | -0.2410         |
+### Framework versions
+- PEFT 0.12.0
+- Transformers 4.46.1
+- Pytorch 2.5.1+cu124
+- Datasets 3.1.0
+- Tokenizers 0.20.3

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7ce93f7ae2785c5ef2370c16c37beefe9e1e5ad81940f0901b4240ab58806f30
 size 83945296

 version https://git-lfs.github.com/spec/v1
+oid sha256:4f07d7347324c4456569dd8925d33df733dd8d1e2f6a661e8c26fec67edeafd9
 size 83945296