Artificial Hippocampus Networks for Efficient Long-Context Modeling
Abstract
A memory framework combining short-term and long-term memory in neural networks improves long-sequence modeling efficiency and performance.
Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performance comparable or even superior to full-attention models, while substantially reducing computational and memory requirements. For instance, augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5% and memory cache by 74.0%, while improving its average score on LV-Eval (128k sequence length) from 4.41 to 5.88. Code is available at: https://github.com/ByteDance-Seed/AHN.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SCOUT: Toward Sub-Quadratic Attention via Segment Compression for Optimized Utility in Transformers (2025)
- Native Hybrid Attention for Efficient Sequence Modeling (2025)
- MemMamba: Rethinking Memory Patterns in State Space Model (2025)
- KV-Efficient VLA: A Method of Speed up Vision Language Model with RNN-Gated Chunked KV Cache (2025)
- Short window attention enables long-term memorization (2025)
- LAWCAT: Efficient Distillation from Quadratic to Linear Attention with Convolution across Tokens for Long Context Modeling (2025)
- Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 9
Browse 9 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper