kurogane
/

Gemma-2-2B-SearchHelper-20240906

Model card Files Files and versions Community

kurogane commited on Sep 7, 2024

Commit

9c32e23

·

verified ·

1 Parent(s): b348c0d

Update README.md

Files changed (1) hide show

README.md +32 -11

README.md CHANGED Viewed

@@ -122,20 +122,41 @@ print(json_text)
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 [More Information Needed]

 ### Training Procedure
+Unslothにぶち込みました。a6000でだいたい35分位かかりました。
 #### Training Hyperparameters
+```
+from trl import SFTTrainer
+from transformers import TrainingArguments
+from unsloth import is_bfloat16_supported
+trainer = SFTTrainer(
+    model = model,
+    tokenizer = tokenizer,
+    train_dataset = dataset,
+    dataset_text_field = "text",
+    max_seq_length = max_seq_length,
+    dataset_num_proc = 2,
+    packing = False, # Can make training 5x faster for short sequences.
+    args = TrainingArguments(
+        per_device_train_batch_size = 8,
+        gradient_accumulation_steps = 8,
+        warmup_steps = 5,
+        num_train_epochs = 1, # Set this for 1 full training run.
+        # max_steps = 60,
+        learning_rate = 2e-4,
+        fp16 = not is_bfloat16_supported(),
+        bf16 = is_bfloat16_supported(),
+        logging_steps = 1,
+        optim = "adamw_8bit",
+        weight_decay = 0.01,
+        lr_scheduler_type = "linear",
+        seed = 3407,
+        output_dir = "outputs",
+    ),
+)
+```
 [More Information Needed]