Hamanasu 4B

## 🌌 Overview This model is the Chat tune of the Instruct model, More accurately it is the "brainrotted" version, Finetuned with Bsky, 4chan and Discord logs, Its... really something beautiful. The model is suited best towards being a highly dumb chat partner rather then regular RP The model is suited for traditional RP, All thanks to Tav for funding the train. Support me and my finetunes on Ko-Fi https://ko-fi.com/deltavector

### ⚔️ Hardware - 8x H100s - Epochs: 4 - Base: `Delta-Vector/Hamanasu-4B-Instruct-KTO-V2`

## 🎲 Recommended Sampler Preset ```yml ST sampler preset: https://files.catbox.moe/wtkp0l.json System prompt: Blank. ```

## Axolotl Config ꒰(˶• ᴗ •˶)꒱

```yaml ase_model: ./model model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer hub_model_id: NewEden/Hamanasu-4B-RP-v2 hub_strategy: "all_checkpoints" push_dataset_to_hub: hf_use_auth_token: true ## qlora COPE load_in_8bit: false load_in_4bit: false strict: false ## data datasets: - path: NewEden/Discord-Filtered type: dan-chat-advanced - path: NewEden/Basket-Weaving-Filtered type: dan-chat-advanced - path: NewEden/Misc-Data-Sharegpt-Prefixed type: dan-chat-advanced - path: NewEden/BlueSky-10K-Complexity type: dan-chat-advanced - path: PocketDoc/Dans-Kinomaxx-VanillaBackrooms type: dan-chat-advanced - path: PocketDoc/Dans-Personamaxx-VN type: dan-chat-advanced - path: NewEden/LIMARP-Complexity type: dan-chat-advanced - path: NewEden/OpenCAI-ShareGPT type: dan-chat-advanced - path: NewEden/Creative_Writing-Complexity type: dan-chat-advanced - path: NewEden/DeepseekRP-Filtered type: dan-chat-advanced - path: NewEden/Storium-Prefixed-Clean type: dan-chat-advanced shuffle_merged_datasets: true dataset_prepared_path: dataset_prepared-2 val_set_size: 0.01 output_dir: 4b-out ## LIGGER plugins: - axolotl.integrations.liger.LigerPlugin - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin liger_rope: true liger_rms_norm: true liger_layer_norm: true liger_glu_activation: true liger_fused_linear_cross_entropy: false cut_cross_entropy: true ## CTX settings sequence_len: 32768 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true ## Lora #adapter: lora #lora_model_dir: #lora_r: 128 #lora_alpha: 16 #lora_dropout: 0.05 #lora_target_modules: # - gate_proj # - down_proj # - up_proj # - q_proj # - v_proj # - k_proj # - o_proj #lora_fan_in_fan_out: #peft_use_rslora: true #lora_modules_to_save: # - embed_tokens # - lm_head ## WandB wandb_project: tavbussy wandb_entity: wandb_watch: wandb_name: chat-v2 wandb_log_model: ## evals evals_per_epoch: 4 eval_table_size: eval_max_new_tokens: 128 ## hoe params gradient_accumulation_steps: 2 micro_batch_size: 1 num_epochs: 4 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 2e-5 max_grad_norm: 0.2 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true s2_attention: warmup_steps: 40 saves_per_epoch: 2 debug: deepspeed: ./deepspeed_configs/zero3_bf16.json weight_decay: 0.02 fsdp: fsdp_config: special_tokens: pad_token: <|finetune_right_pad_id|> ```

## ⚡ Credits

---

Made by

Delta-Vector