Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29 • 93
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published Feb 13 • 38
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 165
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Paper • 2408.12076 • Published Aug 22, 2024 • 12