apriasmoro commited on
Commit
9a17fb5
·
verified ·
1 Parent(s): be45382

End of training

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-8B-Base
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: 03d10d3d-d0de-4e3a-a192-1749f29583a3
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.10.0.dev0`
20
+ ```yaml
21
+ adapter: lora
22
+ base_model: Qwen/Qwen3-8B-Base
23
+ bf16: true
24
+ datasets:
25
+ - data_files:
26
+ - 45c346a7c1e52747_train_data.json
27
+ ds_type: json
28
+ format: custom
29
+ path: /workspace/input_data/
30
+ type:
31
+ field_input: None
32
+ field_instruction: instruct
33
+ field_output: output
34
+ field_system: None
35
+ format: None
36
+ no_input_format: None
37
+ system_format: '{system}'
38
+ system_prompt: None
39
+ eval_max_new_tokens: 128
40
+ evals_per_epoch: 4
41
+ flash_attention: false
42
+ fp16: false
43
+ gradient_accumulation_steps: 1
44
+ gradient_checkpointing: true
45
+ group_by_length: true
46
+ hub_model_id: apriasmoro/03d10d3d-d0de-4e3a-a192-1749f29583a3
47
+ learning_rate: 0.0002
48
+ load_in_4bit: false
49
+ logging_steps: 10
50
+ lora_alpha: 16
51
+ lora_dropout: 0.05
52
+ lora_fan_in_fan_out: false
53
+ lora_r: 8
54
+ lora_target_linear: true
55
+ lr_scheduler: cosine
56
+ max_steps: 14
57
+ micro_batch_size: 16
58
+ mlflow_experiment_name: /tmp/45c346a7c1e52747_train_data.json
59
+ output_dir: llama3_lora_output
60
+ rl: null
61
+ sample_packing: true
62
+ save_steps: 6
63
+ sequence_len: 2048
64
+ tf32: true
65
+ tokenizer_type: AutoTokenizer
66
+ train_on_inputs: true
67
+ trl: null
68
+ trust_remote_code: true
69
+ wandb_name: c732d2b4-46df-4ed8-83ee-7525f648965f
70
+ wandb_project: Gradients-On-Demand
71
+ wandb_run: llama3_h200_run
72
+ wandb_runid: c732d2b4-46df-4ed8-83ee-7525f648965f
73
+ warmup_steps: 100
74
+ weight_decay: 0.01
75
+
76
+ ```
77
+
78
+ </details><br>
79
+
80
+ # 03d10d3d-d0de-4e3a-a192-1749f29583a3
81
+
82
+ This model is a fine-tuned version of [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) on an unknown dataset.
83
+
84
+ ## Model description
85
+
86
+ More information needed
87
+
88
+ ## Intended uses & limitations
89
+
90
+ More information needed
91
+
92
+ ## Training and evaluation data
93
+
94
+ More information needed
95
+
96
+ ## Training procedure
97
+
98
+ ### Training hyperparameters
99
+
100
+ The following hyperparameters were used during training:
101
+ - learning_rate: 0.0002
102
+ - train_batch_size: 16
103
+ - eval_batch_size: 16
104
+ - seed: 42
105
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
106
+ - lr_scheduler_type: cosine
107
+ - lr_scheduler_warmup_steps: 100
108
+ - training_steps: 14
109
+
110
+ ### Training results
111
+
112
+
113
+
114
+ ### Framework versions
115
+
116
+ - PEFT 0.15.2
117
+ - Transformers 4.51.3
118
+ - Pytorch 2.5.1+cu124
119
+ - Datasets 3.5.1
120
+ - Tokenizers 0.21.1