✔️ modification of the cross-entropy loss function designed specifically for training LLMs. ✔️ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance. ✔️ more stable and efficient training, leading to models that generalize better.
Check it out, give it a spin, and let me know what you think!
Licensed under the Apache 2.0 license and ready to use. Happy training! 🔥🤖