Zephyr 7B SFT aligned with DPO on OASST1 with β=0.01

This repo contains LoRA adapter created by aligning Zephyr 7B SFT on the OpenAssistant Conversations Dataset (OASST1) dataset using Direct Preference Optimization (DPO). It was trained as a series of models for studying DPO alignment.

Model details

See the base model card for usage and chat template details.

Training hyperparameters

  • Epochs: 1
  • Batch size: 16
  • Learning rate: 1e-06
  • Learning rate scheduler: cosine
  • Learning rate warmup ratio: 0.1
  • Gradient accumulation: 2
  • LoRA:
    • rank: 64
    • alpha: 64
    • dropout: 0.05
    • target modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

License

This adapter is released under the Apache License 2.0.

Citation

If this work was helpful, please cite:

TBA
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jmajkutewicz/zephyr-7b-dpo_oasst1

Adapter
(102)
this model

Dataset used to train jmajkutewicz/zephyr-7b-dpo_oasst1

Collection including jmajkutewicz/zephyr-7b-dpo_oasst1