jmajkutewicz
/

zephyr-7b-dpo_oasst1

Text Generation

Model card Files Files and versions

Zephyr 7B SFT aligned with DPO on OASST1 with β=0.01

This repo contains LoRA adapter created by aligning Zephyr 7B SFT on the OpenAssistant Conversations Dataset (OASST1) dataset using Direct Preference Optimization (DPO). It was trained as a series of models for studying DPO alignment.

Model details

Base model: alignment-handbook/zephyr-7b-sft-full
Preference dataset: OpenAssistant/oasst1
DPO beta: 0.01
Training framework: PEFT/LoRA

See the base model card for usage and chat template details.

Training hyperparameters

Epochs: 1
Batch size: 16
Learning rate: 1e-06
Learning rate scheduler: cosine
Learning rate warmup ratio: 0.1
Gradient accumulation: 2
LoRA:
- rank: 64
- alpha: 64
- dropout: 0.05
- target modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

License

This adapter is released under the Apache License 2.0.

Citation

If this work was helpful, please cite:

TBA

Downloads last month: 10

Model tree for jmajkutewicz/zephyr-7b-dpo_oasst1

Base model

mistralai/Mistral-7B-v0.1

Finetuned

alignment-handbook/zephyr-7b-sft-full

Adapter

(102)

this model

Dataset used to train jmajkutewicz/zephyr-7b-dpo_oasst1

Collection including jmajkutewicz/zephyr-7b-dpo_oasst1

Evaluation of DPO Configurations

An Empirical Study of DPO Configuration Choices for LLM Alignment • 14 items • Updated 3 days ago