RL finetuning on this merge leads to model collapse

#11

by radna - opened Mar 18

Mar 18

@Wanfq I tried running a couple of different times, even with different tokenizer because I thought that was the issue, but after just one iteration, not even epoch, the model starts outputing gibberish. Have you guys seen this happened, and can only finetune before merging?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment