nanowell's picture
Update README.md
7a2e3f4 verified

Novel training procedure to deslopify instruct/assistant models.

No SFT.

Pure RL with a good signal.