normster
/

RealGuardrails-Llama3.1-8B-Instruct-SFT-DPO

Text Generation

text-generation-inference

Model card Files Files and versions

RealGuardrails-Llama3.1-8B-Instruct-SFT-DPO / README.md

normster's picture

Upload README.md with huggingface_hub

4cac1fd verified 7 months ago

|

history blame contribute delete

1.18 kB

	---
	license: mit
	datasets:
	- normster/RealGuardrails
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	- normster/RealGuardrails-Llama3.1-8B-Instruct-SFT
	library_name: transformers
	---

	# RealGuardrails Models

	This model was trained on the [RealGuardrails](https://huggingface.co/datasets/normster/RealGuardrails) dataset, an instruction-tuning dataset focused on improving system prompt adherence and precedence. In particular, it was trained via SFT on the `systemmix` split (150K examples) using our custom training library [torchllms](https://github.com/normster/torchllms) (yielding [normster/RealGuardrails-Llama3.1-8B-Instruct-SFT](https://huggingface.co/normster/RealGuardrails-Llama3.1-8B-Instruct-SFT)), and then trained via DPO on the `preferencemix` split (30K examples), and converted back to a `transformers` compatible checkpoint.

	## Training Hyperparameters

	\| Name \| Value \|
	\| :--- \| :--- \|
	\| DPO beta \| 0.01 \|
	\| optimizer \| AdamW \|
	\| batch size \| 128 \|
	\| learning rate \| 1e-5 \|
	\| lr scheduler \| cosine with 50 warmup steps \|
	\| betas \| (0.9, 0.999) \|
	\| eps \| 1e-8 \|
	\| weight decay \| 0 \|
	\| epochs \| 1 \|
	\| max grad norm \| 1.0 \|
	\| precision \| bf16 \|
	\| max length \| 4096 \|