Papers
arxiv:2503.22937

SSM-RDU: A Reconfigurable Dataflow Unit for Long-Sequence State-Space Models

Published on Mar 29
Authors:

Abstract

An architectural extension to a Reconfigurable Dataflow Unit enables efficient execution of FFT and scan operations in long-sequence state-space models, outperforming GPUs.

AI-generated summary

Long-sequence state-space models (SSMs) such as Hyena and Mamba replace the quadratic complexity of self-attention with more efficient FFT and scan operations. However, modern accelerators like GPUs are poorly suited to these non-GEMM workloads due to rigid execution models and specialization for dense matrix operations. This paper proposes architectural extensions to a baseline Reconfigurable Dataflow Unit (RDU) that efficiently support FFT-based and scan-based SSMs. By introducing lightweight interconnect enhancements within compute tiles, the extended RDU enables spatial mapping of FFT and scan dataflows with less than 1% area and power overhead. The resulting architecture achieves a 5.95X speedup over the GPU and a 1.95X speedup over the baseline RDU for Hyena, and a 2.12X and 1.75X speedup over the GPU and baseline RDU, respectively, for Mamba.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.22937 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.22937 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.22937 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.