Llama-3.1-8B-Instruct-dpo-mistral-1000 / training_rewards_accuracies.png
chchen's picture
End of training
1fbe65a verified
training_rewards_accuracies.png