myewon commited on
Commit
8183533
·
verified ·
1 Parent(s): efdc1cf

myewon/instruction_result2

Browse files
README.md CHANGED
@@ -15,9 +15,9 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # instruction_tuned_model
17
 
18
- This model is a fine-tuned version of [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.3074
21
 
22
  ## Model description
23
 
@@ -37,35 +37,47 @@ More information needed
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 0.0002
40
- - train_batch_size: 16
41
- - eval_batch_size: 8
42
  - seed: 42
43
- - gradient_accumulation_steps: 8
 
44
  - total_train_batch_size: 128
45
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
  - lr_scheduler_type: cosine
47
  - lr_scheduler_warmup_ratio: 0.1
48
  - num_epochs: 3
 
49
 
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:------:|:----:|:---------------:|
54
- | 0.4333 | 0.2712 | 20 | 0.3822 |
55
- | 0.3086 | 0.5424 | 40 | 0.3394 |
56
- | 0.2944 | 0.8136 | 60 | 0.3268 |
57
- | 0.2803 | 1.0847 | 80 | 0.3200 |
58
- | 0.2688 | 1.3559 | 100 | 0.3165 |
59
- | 0.2675 | 1.6271 | 120 | 0.3125 |
60
- | 0.2627 | 1.8983 | 140 | 0.3095 |
61
- | 0.2529 | 2.1695 | 160 | 0.3089 |
62
- | 0.253 | 2.4407 | 180 | 0.3079 |
63
- | 0.2513 | 2.7119 | 200 | 0.3074 |
 
 
 
 
 
 
 
 
 
64
 
65
 
66
  ### Framework versions
67
 
68
- - PEFT 0.13.1
69
- - Transformers 4.44.2
70
  - Pytorch 2.4.1+cu121
71
- - Tokenizers 0.19.1
 
 
15
 
16
  # instruction_tuned_model
17
 
18
+ This model is a fine-tuned version of [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.7426
21
 
22
  ## Model description
23
 
 
37
 
38
  The following hyperparameters were used during training:
39
  - learning_rate: 0.0002
40
+ - train_batch_size: 64
41
+ - eval_batch_size: 32
42
  - seed: 42
43
+ - distributed_type: multi-GPU
44
+ - gradient_accumulation_steps: 2
45
  - total_train_batch_size: 128
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.1
49
  - num_epochs: 3
50
+ - mixed_precision_training: Native AMP
51
 
52
  ### Training results
53
 
54
  | Training Loss | Epoch | Step | Validation Loss |
55
  |:-------------:|:------:|:----:|:---------------:|
56
+ | 0.9054 | 0.1571 | 100 | 0.9336 |
57
+ | 0.7821 | 0.3142 | 200 | 0.8612 |
58
+ | 0.7367 | 0.4713 | 300 | 0.8301 |
59
+ | 0.7238 | 0.6284 | 400 | 0.8054 |
60
+ | 0.6822 | 0.7855 | 500 | 0.7912 |
61
+ | 0.6511 | 0.9427 | 600 | 0.7823 |
62
+ | 0.6166 | 1.0998 | 700 | 0.7764 |
63
+ | 0.5797 | 1.2569 | 800 | 0.7649 |
64
+ | 0.5902 | 1.4140 | 900 | 0.7541 |
65
+ | 0.5916 | 1.5711 | 1000 | 0.7562 |
66
+ | 0.5816 | 1.7282 | 1100 | 0.7468 |
67
+ | 0.5662 | 1.8853 | 1200 | 0.7448 |
68
+ | 0.4927 | 2.0424 | 1300 | 0.7501 |
69
+ | 0.4796 | 2.1995 | 1400 | 0.7540 |
70
+ | 0.4683 | 2.3566 | 1500 | 0.7472 |
71
+ | 0.4854 | 2.5137 | 1600 | 0.7453 |
72
+ | 0.4733 | 2.6709 | 1700 | 0.7455 |
73
+ | 0.4643 | 2.8280 | 1800 | 0.7431 |
74
+ | 0.4535 | 2.9851 | 1900 | 0.7426 |
75
 
76
 
77
  ### Framework versions
78
 
79
+ - PEFT 0.13.2.dev0
80
+ - Transformers 4.45.2
81
  - Pytorch 2.4.1+cu121
82
+ - Datasets 3.0.1
83
+ - Tokenizers 0.20.0
adapter_config.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "myewon/finetuning_result",
5
  "bias": "none",
 
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
8
  "init_lora_weights": true,
@@ -20,8 +21,8 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
- "v_proj",
24
- "q_proj"
25
  ],
26
  "task_type": "CAUSAL_LM",
27
  "use_dora": false,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": null,
5
  "bias": "none",
6
+ "exclude_modules": null,
7
  "fan_in_fan_out": false,
8
  "inference_mode": true,
9
  "init_lora_weights": true,
 
21
  "rank_pattern": {},
22
  "revision": null,
23
  "target_modules": [
24
+ "q_proj",
25
+ "v_proj"
26
  ],
27
  "task_type": "CAUSAL_LM",
28
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e822fdb4746e301faa9478bae8224634d5ee048435da46723608325e51058bbc
3
- size 197461872
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee7772e35c9d754b9b5a42c201bec23825e169865f6a4e3ab3f0f205d258787d
3
+ size 98764168
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c7216fece0b2bca37f129a08380de7c5e307439795de432b9debba3555d02d58
3
- size 5536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3d907099a73f42f4550183f061c3ce76801b3030c753c62c9c06ab86deb1d08
3
+ size 6752