license: mit | |
datasets: | |
- normster/RealGuardrails | |
base_model: | |
- meta-llama/Llama-3.1-8B-Instruct | |
- normster/RealGuardrails-Llama3.1-8B-Instruct-SFT | |
library_name: transformers | |
# RealGuardrails Models | |
This model was trained on the [RealGuardrails](https://huggingface.co/datasets/normster/RealGuardrails) dataset, an instruction-tuning dataset focused on improving system prompt adherence and precedence. In particular, it was trained via SFT on the `systemmix` split (150K examples) using our custom training library [torchllms](https://github.com/normster/torchllms) (yielding [normster/RealGuardrails-Llama3.1-8B-Instruct-SFT](https://huggingface.co/normster/RealGuardrails-Llama3.1-8B-Instruct-SFT)), and then trained via DPO on the `preferencemix` split (30K examples), and converted back to a `transformers` compatible checkpoint. | |
## Training Hyperparameters | |
| Name | Value | | |
| :--- | :--- | | |
| DPO beta | 0.01 | | |
| optimizer | AdamW | | |
| batch size | 128 | | |
| learning rate | 1e-5 | | |
| lr scheduler | cosine with 50 warmup steps | | |
| betas | (0.9, 0.999) | | |
| eps | 1e-8 | | |
| weight decay | 0 | | |
| epochs | 1 | | |
| max grad norm | 1.0 | | |
| precision | bf16 | | |
| max length | 4096 | |