Update README.md
Browse files
README.md
CHANGED
@@ -59,11 +59,11 @@ Two DPO fine-tuning experiments were run:
|
|
59 |
- **Monitoring**: Weights & Biases (WandB)
|
60 |
- **Best Epoch Selection**: Based on validation loss
|
61 |
|
62 |
-
##
|
63 |
|
64 |
This model is intended for research and experimentation with preference-based alignment and reward modeling. It is **not** production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
|
65 |
|
66 |
-
##
|
67 |
|
68 |
You can use the model with the `transformers` and `trl` libraries for inference or evaluation:
|
69 |
|
|
|
59 |
- **Monitoring**: Weights & Biases (WandB)
|
60 |
- **Best Epoch Selection**: Based on validation loss
|
61 |
|
62 |
+
## Intended Use
|
63 |
|
64 |
This model is intended for research and experimentation with preference-based alignment and reward modeling. It is **not** production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
|
65 |
|
66 |
+
## How to Use
|
67 |
|
68 |
You can use the model with the `transformers` and `trl` libraries for inference or evaluation:
|
69 |
|