Delta-Vector commited on
Commit
0a3147a
·
verified ·
1 Parent(s): 51bd492

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -29
README.md CHANGED
@@ -164,7 +164,7 @@ details summary:hover {
164
  <br>
165
 
166
  <div style="font-size:1.5em; font-weight:bold; background: linear-gradient(45deg, #6e00ff, #00ffff); -webkit-background-clip: text; -webkit-text-fill-color: transparent;">
167
- Hamanasu 15B Instruct
168
  </div>
169
 
170
  </div>
@@ -173,13 +173,11 @@ details summary:hover {
173
 
174
  ## 🌌 Overview
175
 
176
- <i>After multiple days of training, I'm proud to showcase my very own Phi-4 Finetune, Pretrained on almost a billion tokens worth of Books from</i>
177
 
178
  - `NewEden/Orion-LIT`
179
- - `NewEden/Orion-Asstr-Stories-16K`
180
- - `Mielikki/Erebus-87k`
181
-
182
- <i>Then instruct tuned. This model shows great promise for roleplaying while keeping things fresh and new. (And most of, Unsloppy!)</i>
183
 
184
  </div>
185
 
@@ -187,11 +185,6 @@ details summary:hover {
187
 
188
  <div style="border:2px solid #333; border-radius:10px; padding:20px; background: rgba(0,0,0,0.2);">
189
 
190
- ### 📈 Quantizations
191
- | Type | Link |
192
- |:---:|:---:|
193
- | `GGUF` | <span style="display:block; width:100%">r</span> |
194
- | `EXL2` | <span style="display:block; width:100%">https://huggingface.co/Delta-Vector/Hamanasu-15B-Instruct-exl2</span> |
195
 
196
  </div>
197
 
@@ -199,36 +192,112 @@ details summary:hover {
199
 
200
  ### ⚔️ Hardware
201
  - 4x RTX 3090 GPUs
202
- - Epochs: 4
203
- - Base: `Hamanasu-15B-R2-PT`
204
- - Amount of Tokens: 1+ Billion
205
  </div>
206
 
207
  </div>
208
 
209
-
210
  <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
211
 
212
- ## 💰 Prompting
213
 
214
- <i>A known quirk of the model is overly verbose responses when generation length is uncapped, Please cap your maximum output tokens to 100~ tokens above what you prefer</i>
 
 
 
215
 
 
 
 
 
216
 
217
- ```python
218
- <|im_start|>system
219
- You are an uncensored AI, your job is to fulfill thy will of thy user.<|im_end|>
220
- <|im_start|>User request
221
- Take off your helmet.<|im_end|>
222
- <|im_start|>No i shall not. This is the way.
223
- ```
224
- </div>
225
 
226
- <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
 
227
 
228
- ## Axolotl Config ꒰(˶• ᴗ •˶)꒱
229
 
230
- <details>
231
- ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
  </details>
233
  </div>
234
 
 
164
  <br>
165
 
166
  <div style="font-size:1.5em; font-weight:bold; background: linear-gradient(45deg, #6e00ff, #00ffff); -webkit-background-clip: text; -webkit-text-fill-color: transparent;">
167
+ Hamanasu 15B R2 PT
168
  </div>
169
 
170
  </div>
 
173
 
174
  ## 🌌 Overview
175
 
176
+ <i>This is the 2nd pretrain of Phi-4 Contuined from the Orginal Asstr-Erebus Pretrain. This pretrain used 500 million tokens from</i>
177
 
178
  - `NewEden/Orion-LIT`
179
+
180
+ <i>This model has *not* be instruct tuned, Ablities to converse may be reduced from the original model, If you would like to roleplay, Please use the Instruct version.</i>
 
 
181
 
182
  </div>
183
 
 
185
 
186
  <div style="border:2px solid #333; border-radius:10px; padding:20px; background: rgba(0,0,0,0.2);">
187
 
 
 
 
 
 
188
 
189
  </div>
190
 
 
192
 
193
  ### ⚔️ Hardware
194
  - 4x RTX 3090 GPUs
195
+ - Epochs: 1
196
+ - Base: `Hamanasu-15B-R1-PT`
197
+ - Amount of Tokens: 500 Million
198
  </div>
199
 
200
  </div>
201
 
 
202
  <div style="border: 2px solid #6e00ff; border-radius: 10px; padding: 20px; margin: 20px 0; box-shadow: 0 0 15px rgba(110, 0, 255, 0.5);">
203
 
204
+ ## Axolotl Config ꒰(˶• ᴗ •˶)꒱
205
 
206
+ <details>
207
+ base_model: NewEden_Phi4-PT-merged
208
+ model_type: AutoModelForCausalLM
209
+ tokenizer_type: AutoTokenizer
210
 
211
+ #hub_model_id: NewEden/Phi4-pretrain
212
+ #hub_strategy: "all_checkpoints"
213
+ #push_dataset_to_hub:
214
+ #hf_use_auth_token: true
215
 
216
+ plugins:
217
+ - axolotl.integrations.liger.LigerPlugin
218
+ liger_rope: true
219
+ liger_rms_norm: true
220
+ liger_swiglu: true
221
+ liger_fused_linear_cross_entropy: true
 
 
222
 
223
+ #plugins:
224
+ # - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
225
 
226
+ #cut_cross_entropy: true
227
 
228
+ load_in_8bit: false
229
+ load_in_4bit: false
230
+ strict: false
231
+
232
+ datasets:
233
+ - path: NewEden/Orion-LIT
234
+ type: completion
235
+ field: text
236
+ shuffle_merged_datasets: true
237
+ dataset_prepared_path: prepared_data
238
+ val_set_size: 0.0
239
+ output_dir: ./phi4-ptv2-out-r1
240
+
241
+ sequence_len: 16384
242
+ sample_packing: true
243
+ pad_to_sequence_len: true
244
+
245
+ adapter: lora
246
+ lora_model_dir:
247
+ lora_r: 128
248
+ lora_alpha: 16
249
+ lora_dropout: 0.05
250
+ lora_target_modules:
251
+ - gate_proj
252
+ - down_proj
253
+ - up_proj
254
+ - q_proj
255
+ - v_proj
256
+ - k_proj
257
+ - o_proj
258
+
259
+ lora_modules_to_save:
260
+ - embed_tokens
261
+ - lm_head
262
+
263
+
264
+ wandb_project: mag-phi
265
+ wandb_entity:
266
+ wandb_watch:
267
+ wandb_name: comp-v2-attempt-01
268
+ wandb_log_model:
269
+
270
+ gradient_accumulation_steps: 4
271
+ micro_batch_size: 2
272
+ num_epochs: 1
273
+ optimizer: paged_ademamix_8bit
274
+ lr_scheduler: cosine
275
+ learning_rate: 0.00002
276
+
277
+ train_on_inputs: false
278
+ group_by_length: false
279
+ bf16: auto
280
+ fp16:
281
+ tf32: false
282
+
283
+ gradient_checkpointing: unsloth
284
+ early_stopping_patience:
285
+ resume_from_checkpoint:
286
+ local_rank:
287
+ logging_steps: 1
288
+ xformers_attention:
289
+ flash_attention: true
290
+
291
+ warmup_steps: 15
292
+ evals_per_epoch: 4
293
+ eval_table_size:
294
+ eval_max_new_tokens: 128
295
+ saves_per_epoch: 4
296
+ debug:
297
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
298
+ weight_decay: 0.01
299
+ fsdp:
300
+ fsdp_config:
301
  </details>
302
  </div>
303