---
library_name: peft
license: apache-2.0
base_model: PocketDoc/Dans-PersonalityEngine-V1.1.0-12b
tags:
- generated_from_trainer
datasets:
- Personamaxx-VN.json
- NewEden/LIMARP-Complexity
- NewEden/PIPPA-Mega-Filtered
- NewEden/OpenCAI-ShareGPT
- NewEden/Creative_Writing-Complexity
- NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed
- prosemaxx-adventure-failuremaxx.json
- NewEden/Books-V2-ShareGPT
- NewEden/Deepseek-V3-RP-Filtered
- NewEden/BlueSky-10K-Complexity
- NewEden/Final-Alpindale-LNs-ShareGPT
- NewEden/DeepseekRP-Filtered
- NewEden/RP-logs-V2-Experimental
- anthracite-org/kalo_opus_misc_240827
- anthracite-org/kalo_misc_part2
- NewEden/vanilla-backrooms-claude-sharegpt
- NewEden/Storium-Prefixed-Clean
model-index:
- name: output/Francois-V2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.8.0`
```yaml
base_model: PocketDoc/Dans-PersonalityEngine-V1.1.0-12b
## Liger+CCE 
plugins:
  - axolotl.integrations.liger.LigerPlugin
#  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
#cut_cross_entropy: false

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Personamaxx-VN.json
    type: dan-chat-advanced
  - path: NewEden/LIMARP-Complexity
    type: dan-chat-advanced
  - path: NewEden/PIPPA-Mega-Filtered
    type: dan-chat-advanced
  - path: NewEden/OpenCAI-ShareGPT
    type: dan-chat-advanced
  - path: NewEden/Creative_Writing-Complexity
    type: dan-chat-advanced
  - path: NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed
    type: dan-chat-advanced
  - path: prosemaxx-adventure-failuremaxx.json
    type: dan-chat-advanced
  - path: NewEden/Books-V2-ShareGPT
    type: dan-chat-advanced
  - path: NewEden/Deepseek-V3-RP-Filtered
    type: dan-chat-advanced
  - path: NewEden/BlueSky-10K-Complexity
    type: dan-chat-advanced
  - path: NewEden/Final-Alpindale-LNs-ShareGPT
    type: dan-chat-advanced
  - path: NewEden/DeepseekRP-Filtered
    type: dan-chat-advanced 
  - path: NewEden/RP-logs-V2-Experimental
    type: dan-chat-advanced 
  - path: anthracite-org/kalo_opus_misc_240827
    type: dan-chat-advanced 
  - path: anthracite-org/kalo_misc_part2
    type: dan-chat-advanced 
  - path: NewEden/vanilla-backrooms-claude-sharegpt
    type: dan-chat-advanced 
  - path: NewEden/Storium-Prefixed-Clean
    type: dan-chat-advanced 


## LOra so we dont fuck brains
adapter: lora
lora_model_dir:
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05 
peft_use_rslora: true
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

#lora_modules_to_save:
# - embed_tokens
# - lm_head
shuffle_merged_datasets: true
dataset_prepared_path: prepared_data
output_dir: ./output/Francois-V2


## Ctx Length
sequence_len: 16384
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: false
#batch_flattening: true

#torch_compile: auto  # Optional[Union[Literal["auto"], bool]]
#torch_compile_backend:  # Optional[str]
## Wandb
wandb_project: Francois
wandb_entity:
wandb_watch:
wandb_name: v3
wandb_log_model:

## Hparams
gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 4
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate: 3e-5
max_grad_norm: 0.0001
weight_decay: 0.02
warmup_steps: 40

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

## Unsloth is broken, Use grad-ckpting.
gradient_checkpointing: true
early_stopping_patience:
#resume_from_checkpoint: /home/ubuntu/Mango/axolotl/outputs/checkpoint-1088
local_rank:
logging_steps: 1
xformers_attention: False
flash_attention: True
s2_attention:


## Evals
val_set_size: 0.0025
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 2
debug:
# Multi-GPU
deepspeed: ./deepspeed_configs/zero2.json
fsdp:
fsdp_config:
special_tokens:
  pad_token: <pad>

```

</details><br>

# output/Francois-V2

This model is a fine-tuned version of [PocketDoc/Dans-PersonalityEngine-V1.1.0-12b](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.1.0-12b) on the Personamaxx-VN.json, the NewEden/LIMARP-Complexity, the NewEden/PIPPA-Mega-Filtered, the NewEden/OpenCAI-ShareGPT, the NewEden/Creative_Writing-Complexity, the NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed, the prosemaxx-adventure-failuremaxx.json, the NewEden/Books-V2-ShareGPT, the NewEden/Deepseek-V3-RP-Filtered, the NewEden/BlueSky-10K-Complexity, the NewEden/Final-Alpindale-LNs-ShareGPT, the NewEden/DeepseekRP-Filtered, the NewEden/RP-logs-V2-Experimental, the anthracite-org/kalo_opus_misc_240827, the anthracite-org/kalo_misc_part2, the NewEden/vanilla-backrooms-claude-sharegpt and the NewEden/Storium-Prefixed-Clean datasets.
It achieves the following results on the evaluation set:
- Loss: 2.1779

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Use paged_ademamix_8bit and the args are:
No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 40
- num_epochs: 4.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 1.731         | 0.0023 | 1    | 2.4143          |
| 1.451         | 0.2506 | 109  | 2.3014          |
| 1.4026        | 0.5011 | 218  | 2.2824          |
| 1.6573        | 0.7517 | 327  | 2.2581          |
| 1.587         | 1.0023 | 436  | 2.2424          |
| 1.2928        | 1.2529 | 545  | 2.2229          |
| 1.4023        | 1.5034 | 654  | 2.2034          |
| 1.6312        | 1.7540 | 763  | 2.1959          |
| 1.3044        | 2.0046 | 872  | 2.1909          |
| 1.4984        | 2.2552 | 981  | 2.1876          |
| 1.3767        | 2.5057 | 1090 | 2.1840          |
| 1.3972        | 2.7563 | 1199 | 2.1812          |
| 1.3663        | 3.0069 | 1308 | 2.1792          |
| 1.4958        | 3.2575 | 1417 | 2.1785          |
| 1.4214        | 3.5080 | 1526 | 2.1784          |
| 1.4001        | 3.7586 | 1635 | 2.1779          |


### Framework versions

- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1