Safetensors
English
Finnish
bloom
laineyyy commited on
Commit
93c7c03
·
verified ·
1 Parent(s): c304d85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -14,7 +14,7 @@ This is an SFT-tuned model of [Poro-34B](https://huggingface.co/LumiOpen/Poro-34
14
 
15
  ## Datasets
16
 
17
- **SFT**
18
 
19
  We use a curated subset of Open Assistant 2 and translated the dataset into Finnish using Poro-34B.
20
 
@@ -24,14 +24,67 @@ We use a curated subset of Open Assistant 2 and translated the dataset into Finn
24
  **Finnish OASST2**
25
  - [instruction-collection-fin](https://huggingface.co/datasets/LumiOpen/instruction-collection-fin) (oasst2 subset)
26
 
27
- **DPO**
28
 
29
 
30
  ## Recipes
31
 
32
  **SFT**
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  **DPO**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Evaluation
37
 
 
14
 
15
  ## Datasets
16
 
17
+ ### SFT
18
 
19
  We use a curated subset of Open Assistant 2 and translated the dataset into Finnish using Poro-34B.
20
 
 
24
  **Finnish OASST2**
25
  - [instruction-collection-fin](https://huggingface.co/datasets/LumiOpen/instruction-collection-fin) (oasst2 subset)
26
 
27
+ ### DPO
28
 
29
 
30
  ## Recipes
31
 
32
  **SFT**
33
 
34
+ ```
35
+ bf16: true
36
+ do_eval: true
37
+ evaluation_strategy: epoch
38
+ gradient_accumulation_steps: 2
39
+ gradient_checkpointing: true
40
+ gradient_checkpointing_kwargs:
41
+ use_reentrant: False
42
+ learning_rate: 2.0e-05
43
+ log_level: info
44
+ logging_steps: 50
45
+ logging_strategy: steps
46
+ lr_scheduler_type: cosine
47
+ max_seq_length: 2048
48
+ max_steps: -1
49
+ num_train_epochs: 3
50
+ output_dir: data/poro-sft-oasst2
51
+ overwrite_output_dir: true
52
+ per_device_eval_batch_size: 4
53
+ per_device_train_batch_size: 2
54
+ remove_unused_columns: true
55
+ save_strategy: "epoch"
56
+ save_total_limit: 1
57
+ seed: 42
58
+ warmup_ratio: 0.1
59
+ ```
60
+
61
  **DPO**
62
+ ```
63
+ bf16: true
64
+ beta: 0.05
65
+ do_eval: true
66
+ evaluation_strategy: epoch
67
+ gradient_accumulation_steps: 1
68
+ gradient_checkpointing: true
69
+ gradient_checkpointing_kwargs:
70
+ use_reentrant: False
71
+ learning_rate: 5.0e-7
72
+ log_level: info
73
+ logging_steps: 20
74
+ lr_scheduler_type: cosine
75
+ max_length: 1024
76
+ max_prompt_length: 512
77
+ num_train_epochs: 5
78
+ optim: adamw_torch
79
+ output_dir: data/poro-dpo-helpsteer2
80
+ per_device_train_batch_size: 2
81
+ per_device_eval_batch_size: 4
82
+ save_strategy: "epoch"
83
+ save_total_limit: 1
84
+ seed: 42
85
+ warmup_ratio: 0.1
86
+ ```
87
+
88
 
89
  ## Evaluation
90