Padding of labels bug?

#44

by haukurpj - opened Mar 13

Mar 13

I'm currently reimplementing the audio training sample code using Pytorch Lightning and while debugging an issue I noticed in the collator:

        labels = pad_sequence(labels_list, padding_side='left', padding_value=0)

When batching, should the labels not be padded with _IGNORE_INDEX?

ddkk615

Mar 14

I think the attention mask will handle it.

haukurpj

Mar 14

I think it does matter when calculating the loss but the HF trainer is probably handling this case, i.e. converting pad to -100 before the loss calculation.

kenfus

Jun 13

•

edited Jun 30

Yes, I also tried to understand what exactly happens here. The labels tensor arrives at the loss calculation (ForCausalLMLoss) with 0, e.g. this part. Until there, the 0's were still there and in the ForCausalLMLoss I did not see anything which ignores or changes the 0 to -100.

So in the loss calculation, the 0's are still in there. I also have seen that my model starts to learn to predict "!" at the beginning:

PRED: ! ist der We so Trier bis zwei Jahren ratsam.

Is it really an error? Or is there perhaps some hidden mechanic somewhere, which I have missed?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment