Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published 10 days ago • 37
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18 • 73
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18 • 73
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published Jan 22 • 68
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published Jan 22 • 68
The Second Conversational Intelligence Challenge (ConvAI2) Paper • 1902.00098 • Published Jan 31, 2019
ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ) Paper • 2009.11352 • Published Sep 23, 2020
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary Paper • 2205.02340 • Published May 4, 2022
Scaling Transformer to 1M tokens and beyond with RMT Paper • 2304.11062 • Published Apr 19, 2023 • 3
Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information Paper • 2311.01326 • Published Nov 2, 2023 • 2
Uncertainty Guided Global Memory Improves Multi-Hop Question Answering Paper • 2311.18151 • Published Nov 29, 2023