PAINTED FANTASY VISAGE

Mistrall Small 3.2 Upscaled 33B

Overview

Another experimental release. Mistral Small 3.2 24B upscaled by 18 layers to create a 33.6B model. This model then went through pretraining, SFT & DPO.

Can't guarantee the Mistral 3.2 repetition issues are fixed, but this model seems to be less repetitive than my previous attempt.

This is an uncensored creative model intended to excel at character driven RP / ERP where characters are portrayed creatively and proactively.

SillyTavern Settings

Recommended Roleplay Format

> Actions: In plaintext

> Dialogue: "In quotes"

> Thoughts: *In asterisks*

Recommended Samplers

> Temp: 0.6

> MinP: 0.03 - 0.05

> TopP: 0.95 - 1.0

> Dry: 0.8, 1.75, 4

Instruct

Mistral v7 Tekken

Quantizations

GGUF

Static (mradermacher)

iMatrix (mradermacher)

EXL3

3bpw

4bpw

5bpw

6bpw

Creation Process

Creation process: Upscale > Pretrain > SFT > DPO

All training was qlora (including pretrain).

Pretrained on 177MB of data. Dataset consisteted mostly of Light Novels, NSFW stories, SFW stories and filled out with general corpus text from Huggingface FineWeb-2 dataset.

The model then went through SFT using a dataset of approx 3.6 million tokens, 700 RP conversations, 1000 creative writing / instruct samples and about 100 summaries. The bulk of this data has been made public.

Finally, DPO was used to make the model more consistent.

base_model: anthracite-core/Mistral-Small-3.2-24B-Instruct-2506-Text-Only
merge_method: passthrough
dtype: bfloat16
slices:
  - sources:
      - model: anthracite-core/Mistral-Small-3.2-24B-Instruct-2506-Text-Only
        layer_range: [0, 29]
  - sources:
      - model: anthracite-core/Mistral-Small-3.2-24B-Instruct-2506-Text-Only
        layer_range: [10, 39]

Not optimized for cost / performance efficiency, YMMV.

SFT 1*H100

# ====================
# MODEL CONFIGURATION
# ====================
base_model: ./Upscale_Mistral-PT/merged
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: mistral_v7_tekken


# ====================
# DATASET CONFIGURATION
# ====================
datasets:
  - path: ./dataset.jsonl
    type: chat_template
    split: train
    chat_template_strategy: tokenizer
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user: ["user"]
      assistant: ["assistant"]
      system: ["system"]

dataset_prepared_path:
train_on_inputs: false  # Only train on assistant responses
# ====================
# QLORA CONFIGURATION
# ====================
adapter: qlora
load_in_4bit: true
lora_r: 128
lora_alpha: 128
lora_dropout: 0.1
lora_target_linear: true
# lora_modules_to_save:  # Uncomment only if you added NEW tokens
# ====================
# TRAINING PARAMETERS
# ====================
num_epochs: 2
micro_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1.5e-5
optimizer: paged_adamw_8bit
lr_scheduler: rex
warmup_ratio: 0.05
weight_decay: 0.01
max_grad_norm: 1.0
# ====================
# SEQUENCE & PACKING
# ====================
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
# ====================
# HARDWARE OPTIMIZATIONS
# ====================
bf16: auto
flash_attention: true
gradient_checkpointing: true
# ====================
# EVALUATION & CHECKPOINTING
# ====================
save_strategy: steps
save_steps: 5
save_total_limit: 5  # Keep best + last few checkpoints
load_best_model_at_end: true
greater_is_better: false
# ====================
# LOGGING & OUTPUT
# ====================
output_dir: ./Upscale_Mistral-PT-SFT-2
logging_steps: 2
save_safetensors: true
# ====================
# WANDB TRACKING
# ====================
wandb_project: MS3-2-SFT
wandb_entity: your_entity
wandb_name: run_name

zerofata
/

MS3.2-PaintedFantasy-Visage-33B