Qwen/Qwen3-32B · The correct way of fine-tuning on multi-turn trajectories

Apr 29

•

Looking at the qwen 3 chat template, the last assistant turn always has <think></think> tags, even in non-thinking mode, while the intermediate assistant turn never include reasoning traces and tags. This creates an asymmetry between the last assistant turn and all previous turns. And this asymmetry makes it unclear how to fine-tune this model on multi-turn trajectories: if one just does it by training on the whole trajectory with assistant turn masking, the intermediate turns will be OOD as they won't have thinking tags.

What's the recommended approach here? Should we just always train on the last turn or should we simply ignore this asymmetry?

yang-su2000

May 10

You can fine-tune a multiturn trajectory by splitting it into multiple examples, and remove all <think></think> tags of history turns.
So instead of train on the following,

train on the following, where blue ones are masked, red ones are trained.

Hope this helps!

junzeliu

Jun 17

Hi, we are facing the same problem while fine-tuning Qwen3-Base model. Is there a good reason of removing all the thinking blocks in the intermediate turns? Does thinking in intermediate turns harm the performance/training? Thanks!