view article Article CircleGuardBench: New Standard for Evaluating AI Moderation Models By whitecircle-ai and 7 others • May 7 • 53
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29 • 91
xLAM-2 Collection A family of Large Action Model for multi-turn conversation and tool-use • 10 items • Updated May 5 • 16
📊 Commit Message Generation Evaluation 🔍 Collection All the resources for our "Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings" study on CMG metrics! • 7 items • Updated Mar 14 • 2
Wuerstchen: Efficient Pretraining of Text-to-Image Models Paper • 2306.00637 • Published Jun 1, 2023 • 12