Llama 3.1 Instruct SPPO
Llama 3.1 models fine-tuned with Self-Play Preference Optimization (SPPO): https://uclaml.github.io/SPPO/
This collection has no items.
Llama 3.1 models fine-tuned with Self-Play Preference Optimization (SPPO): https://uclaml.github.io/SPPO/