See axolotl config
axolotl version: 0.8.0
base_model: PocketDoc/Dans-PersonalityEngine-V1.1.0-12b
## Liger+CCE
plugins:
- axolotl.integrations.liger.LigerPlugin
# - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
#cut_cross_entropy: false
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Personamaxx-VN.json
type: dan-chat-advanced
- path: NewEden/LIMARP-Complexity
type: dan-chat-advanced
- path: NewEden/PIPPA-Mega-Filtered
type: dan-chat-advanced
- path: NewEden/OpenCAI-ShareGPT
type: dan-chat-advanced
- path: NewEden/Creative_Writing-Complexity
type: dan-chat-advanced
- path: NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed
type: dan-chat-advanced
- path: prosemaxx-adventure-failuremaxx.json
type: dan-chat-advanced
- path: NewEden/Books-V2-ShareGPT
type: dan-chat-advanced
- path: NewEden/Deepseek-V3-RP-Filtered
type: dan-chat-advanced
- path: NewEden/BlueSky-10K-Complexity
type: dan-chat-advanced
- path: NewEden/Final-Alpindale-LNs-ShareGPT
type: dan-chat-advanced
- path: NewEden/DeepseekRP-Filtered
type: dan-chat-advanced
- path: NewEden/RP-logs-V2-Experimental
type: dan-chat-advanced
- path: anthracite-org/kalo_opus_misc_240827
type: dan-chat-advanced
- path: anthracite-org/kalo_misc_part2
type: dan-chat-advanced
- path: NewEden/vanilla-backrooms-claude-sharegpt
type: dan-chat-advanced
- path: NewEden/Storium-Prefixed-Clean
type: dan-chat-advanced
## LOra so we dont fuck brains
adapter: lora
lora_model_dir:
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
peft_use_rslora: true
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
#lora_modules_to_save:
# - embed_tokens
# - lm_head
shuffle_merged_datasets: true
dataset_prepared_path: prepared_data
output_dir: ./output/Francois-V2
## Ctx Length
sequence_len: 16384
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: false
#batch_flattening: true
#torch_compile: auto # Optional[Union[Literal["auto"], bool]]
#torch_compile_backend: # Optional[str]
## Wandb
wandb_project: Francois
wandb_entity:
wandb_watch:
wandb_name: v3
wandb_log_model:
## Hparams
gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 4
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate: 3e-5
max_grad_norm: 0.0001
weight_decay: 0.02
warmup_steps: 40
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
## Unsloth is broken, Use grad-ckpting.
gradient_checkpointing: true
early_stopping_patience:
#resume_from_checkpoint: /home/ubuntu/Mango/axolotl/outputs/checkpoint-1088
local_rank:
logging_steps: 1
xformers_attention: False
flash_attention: True
s2_attention:
## Evals
val_set_size: 0.0025
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 2
debug:
# Multi-GPU
deepspeed: ./deepspeed_configs/zero2.json
fsdp:
fsdp_config:
special_tokens:
pad_token: <pad>
output/Francois-V2
This model is a fine-tuned version of PocketDoc/Dans-PersonalityEngine-V1.1.0-12b on the Personamaxx-VN.json, the NewEden/LIMARP-Complexity, the NewEden/PIPPA-Mega-Filtered, the NewEden/OpenCAI-ShareGPT, the NewEden/Creative_Writing-Complexity, the NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed, the prosemaxx-adventure-failuremaxx.json, the NewEden/Books-V2-ShareGPT, the NewEden/Deepseek-V3-RP-Filtered, the NewEden/BlueSky-10K-Complexity, the NewEden/Final-Alpindale-LNs-ShareGPT, the NewEden/DeepseekRP-Filtered, the NewEden/RP-logs-V2-Experimental, the anthracite-org/kalo_opus_misc_240827, the anthracite-org/kalo_misc_part2, the NewEden/vanilla-backrooms-claude-sharegpt and the NewEden/Storium-Prefixed-Clean datasets. It achieves the following results on the evaluation set:
- Loss: 2.1779
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Use paged_ademamix_8bit and the args are: No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 40
- num_epochs: 4.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.731 | 0.0023 | 1 | 2.4143 |
1.451 | 0.2506 | 109 | 2.3014 |
1.4026 | 0.5011 | 218 | 2.2824 |
1.6573 | 0.7517 | 327 | 2.2581 |
1.587 | 1.0023 | 436 | 2.2424 |
1.2928 | 1.2529 | 545 | 2.2229 |
1.4023 | 1.5034 | 654 | 2.2034 |
1.6312 | 1.7540 | 763 | 2.1959 |
1.3044 | 2.0046 | 872 | 2.1909 |
1.4984 | 2.2552 | 981 | 2.1876 |
1.3767 | 2.5057 | 1090 | 2.1840 |
1.3972 | 2.7563 | 1199 | 2.1812 |
1.3663 | 3.0069 | 1308 | 2.1792 |
1.4958 | 3.2575 | 1417 | 2.1785 |
1.4214 | 3.5080 | 1526 | 2.1784 |
1.4001 | 3.7586 | 1635 | 2.1779 |
Framework versions
- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 3
Model tree for NewEden/Francois-CKPTs
Base model
mistralai/Mistral-Nemo-Base-2407