Evaluation of DPO Configurations
Collection
An Empirical Study of DPO Configuration Choices for LLM Alignment
•
14 items
•
Updated
This repo contains LoRA adapter created by aligning Zephyr 7B SFT on the OpenAssistant Conversations Dataset (OASST1) dataset using Direct Preference Optimization (DPO). It was trained as a series of models for studying DPO alignment.
See the base model card for usage and chat template details.
This adapter is released under the Apache License 2.0.
If this work was helpful, please cite:
TBA
Base model
mistralai/Mistral-7B-v0.1