File size: 3,517 Bytes
9796225 63ccde2 9796225 cb2060d 9796225 cb2060d c55129d cb2060d 9796225 cb2060d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
library_name: transformers
license: apache-2.0
base_model: mistralai/Mistral-Nemo-Instruct-2407
tags:
- roleplay
- conversational
- axolotl
---
# Remnant MN 12b (series 1)
[English](./README.md) | [简体中文](./README-cn.md)
*There's a wisp of dust in the air. It feels like its from a bygone era, but you don't know where from. It lands on your tongue. It tastes nice.*

Remnant is a series of finetuned LLMs focused on SFW and NSFW roleplaying and conversation.
## Quants
GGUF:
- Todo!
EXL3:
- Todo!
EXL2:
- Todo!
MISC:
- Todo!
## Recommended Settings
Chat template: Mistral v7 Tekken
Samplers:
IDK! Your mileage may vary!
## Credits
Humongous thanks to Allura, ilya <3
Big thanks to the developers of Axolotl (whose training framework I used), Mistral (whose model I used), Nebius (whose GPUs I used), and my bank (whose debit card I used)
## Misc
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.8.0.dev0`
```yaml
# === Model Configuration ===
base_model: mistralai/Mistral-Nemo-Instruct-2407 # e.g. "mistralai/Mistral-Small-24B-Instruct-2501"
load_in_8bit: false
load_in_4bit: false
# === Training Setup ===
num_epochs: 2
micro_batch_size: 16
gradient_accumulation_steps: 1
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
# === Hyperparameter Configuration ===
optimizer: apollo_adamw
# Apollo-mini configuration:
optim_args: "proj=random,rank=1,scale=128.0,scale_type=tensor,update_proj_gap=200"
# Regular Apollo configuration:
# optim_args:
optim_target_modules: all_linear
learning_rate: 1e-5
lr_scheduler: rex
weight_decay: 0.01
warmup_ratio: 0.05
# === Data Configuration ===
datasets:
- path: allura-org/inkmix-v3.0
type: chat_template
split: train
field_messages: conversations
message_field_role: from
message_field_content: value
dataset_prepared_path: last_run_prepared
chat_template: jinja
chat_template_jinja: |
{{- bos_token }}{%- for message in messages %}
{%- if message['role'] == 'system' %}
{{- '[SYSTEM_PROMPT]' + message['content'] + '[/SYSTEM_PROMPT]' }}
{%- elif message['role'] == 'user' %}
{{- '[INST]' + message['content'] + '[/INST]' }}
{%- elif message['role'] == 'assistant' %}
{{- message['content'] + eos_token }}
{%- endif %}
{%- endfor %}
# === Plugins ===
plugins:
- axolotl.integrations.liger.LigerPlugin
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
# === Hardware Optimization ===
gradient_checkpointing: unsloth
gradient_checkpointing_kwargs:
use_reentrant: false
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
cut_cross_entropy: true
torch_compile: true
# Only if using multiple GPUs:
# deepspeed: [DEEPSPEED_CONFIG_PATH] # e.g. "deepspeed_configs/zero3_bf16.json"
# === Wandb Tracking ===
wandb_project: nemo12b-inkmix-v3
# === Checkpointing ===
saves_per_epoch: 2
save_total_limit: 3
# === Advanced Settings ===
output_dir: offload
bf16: auto
flash_attention: true
train_on_inputs: false
group_by_length: false
logging_steps: 1
trust_remote_code: true
# nemo doesnt support system prompt ootb
tokens:
- "[SYSTEM_PROMPT]"
- "[/SYSTEM_PROMPT]"
special_tokens:
pad_token: "<pad>"
```
</details> |