File size: 4,454 Bytes

2c90391

---
base_model: mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
library_name: peft
tags:
- generated_from_trainer
model-index:
- name: Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat
  results: []
license: llama3.1
datasets:
- Enderchef/ICONN-1-BasicChat-Data-SuperLite
---

[Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) finetuned using the [ICONN-1-BasicChat-Data-SuperLite](https://huggingface.co/datasets/Enderchef/ICONN-1-BasicChat-Data-SuperLite) dataset as requested by [@Enderchef](https://huggingface.co/Enderchef) under https://huggingface.co/mradermacher/model_requests/discussions/918

axolotl version: `0.9.0`
```yaml
base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: /apool/axolotl/0001.parquet
    chat_template: llama3
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

adapter: lora
lora_model_dir:

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00001

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
special_tokens:
  pad_token: <|end_of_text|>

```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 8.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 3.5587        | 0.0336 | 1    | 3.4337          |
| 3.6702        | 0.2689 | 8    | 3.4260          |
| 3.5802        | 0.5378 | 16   | 3.3161          |
| 3.2421        | 0.8067 | 24   | 3.0272          |
| 2.322         | 1.0672 | 32   | 2.4812          |
| 1.9774        | 1.3361 | 40   | 1.8708          |
| 1.5103        | 1.6050 | 48   | 1.3871          |
| 1.1904        | 1.8739 | 56   | 1.0542          |
| 1.0394        | 2.1345 | 64   | 0.8591          |
| 0.5501        | 2.4034 | 72   | 0.6723          |
| 0.2454        | 2.6723 | 80   | 0.5369          |
| 0.4499        | 2.9412 | 88   | 0.4286          |
| 0.2194        | 3.2017 | 96   | 0.3691          |
| 0.1172        | 3.4706 | 104  | 0.2802          |
| 0.0739        | 3.7395 | 112  | 0.1948          |
| 0.1524        | 4.0    | 120  | 0.1457          |
| 0.0444        | 4.2689 | 128  | 0.1125          |
| 0.1385        | 4.5378 | 136  | 0.0759          |
| 0.0591        | 4.8067 | 144  | 0.0560          |
| 0.0252        | 5.0672 | 152  | 0.0460          |
| 0.0066        | 5.3361 | 160  | 0.0370          |
| 0.023         | 5.6050 | 168  | 0.0252          |
| 0.0033        | 5.8739 | 176  | 0.0202          |
| 0.0029        | 6.1345 | 184  | 0.0168          |
| 0.0024        | 6.4034 | 192  | 0.0154          |
| 0.0103        | 6.6723 | 200  | 0.0146          |
| 0.0108        | 6.9412 | 208  | 0.0139          |
| 0.0049        | 7.2017 | 216  | 0.0138          |
| 0.0025        | 7.4706 | 224  | 0.0139          |
| 0.0036        | 7.7395 | 232  | 0.0136          |


### Framework versions

- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.0+cu128
- Datasets 3.5.0
- Tokenizers 0.21.1