Submitted by huggingaaaaa 20 Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Tsinghua University 5 2