Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT Paper • 1904.09077 • Published Apr 19, 2019
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies Paper • 2305.16958 • Published May 26, 2023 • 2