flammen15X-mistral-7B
A Mistral 7B LLM built from merging pretrained models and finetuning on Jon Durbin's Gutenberg DPO set and Charles Goddard's Chai DPO set. Flammen specializes in exceptional character roleplay, creative writing, and general intelligence
Method
Finetuned using an A100 on Google Colab. 🙏
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
learning_rate=2e-5,
lr_scheduler_type="cosine",
max_steps=200,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=1024,
max_length=1536,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
- Downloads last month
- 15
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for flammenai/flammen15X-mistral-7B
Base model
flammenai/flammen15-mistral-7B
Finetuned
flammenai/flammen15-gutenberg-DPO-v1-7B