mistral-nemo-gutenberg3-12B
Mahou-1.5-mistral-nemo-12B-lorablated finetuned on jondurbin/gutenberg-dpo-v0.1, nbeerbower/gutenberg2-dpo, and nbeerbower/gutenberg-moderne-dpo.
Method
ORPO tuned with 8x A100 for 2 epochs.
QLoRA config:
# QLoRA config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch_dtype,
bnb_4bit_use_double_quant=True,
)
# LoRA config
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)
Training config:
orpo_args = ORPOConfig(
run_name=new_model,
learning_rate=8e-6,
lr_scheduler_type="linear",
max_length=4096,
max_prompt_length=2048,
max_completion_length=2048,
beta=0.1,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
gradient_accumulation_steps=1,
optim="paged_adamw_8bit",
num_train_epochs=2,
evaluation_strategy="steps",
eval_steps=0.2,
logging_steps=1,
warmup_steps=10,
max_grad_norm=10,
report_to="wandb",
output_dir="./results/",
bf16=True,
gradient_checkpointing=True,
)
- Downloads last month
- 101
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.