Fix BF16 training

#19

by alexanderchemeris - opened Jun 3

base: refs/heads/main

←

from: refs/pr/19

Discussion Files changed

-1

alexanderchemeris

Jun 3

For long sequences, this calculation yields an incorrect result due to a lower number of bits in the mantissa of BF16. E.g., for 640 elements, this produces valid_lengths = 639. Converting this early to long solves the issue.

Fix BF16 training3633e9c8

xiezhe24

bytedance-research org Jun 4

Thank you for pointing out this issue. But I think this issue has already been fixed by line 133:

mask = x[:, :, -1].long()

Do you think there are still issues with the current code?

alexanderchemeris

Jun 4

Sorry, I didn't notice this recent fix. I think it's equivalent. I'll check and come back.

xiezhe24 changed pull request status to closed 17 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment