Safetensors
qwen2

RL finetuning on this merge leads to model collapse

#11
by radna - opened

@Wanfq I tried running a couple of different times, even with different tokenizer because I thought that was the issue, but after just one iteration, not even epoch, the model starts outputing gibberish. Have you guys seen this happened, and can only finetune before merging?

Sign up or log in to comment