SeqPE: Transformer with Sequential Position Encoding
Abstract
SeqPE, a fully learnable position encoding framework, enhances the adaptability and scalability of positional encodings in Transformers, improving performance in various tasks and seamless multi-dimensional generalization.
Since self-attention layers in Transformers are permutation invariant by design, positional encodings must be explicitly incorporated to enable spatial understanding. However, fixed-size lookup tables used in traditional learnable position embeddings (PEs) limit extrapolation capabilities beyond pre-trained sequence lengths. Expert-designed methods such as ALiBi and RoPE, mitigate this limitation but demand extensive modifications for adapting to new modalities, underscoring fundamental challenges in adaptability and scalability. In this work, we present SeqPE, a unified and fully learnable position encoding framework that represents each n-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings in an end-to-end manner. To regularize SeqPE's embedding space, we introduce two complementary objectives: a contrastive objective that aligns embedding distances with a predefined position-distance function, and a knowledge distillation loss that anchors out-of-distribution position embeddings to in-distribution teacher representations, further enhancing extrapolation performance. Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM), and accuracy--particularly under context length extrapolation--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign. We release our code, data, and checkpoints at https://github.com/ghrua/seqpe.
Community
Check our paper for an alternative design of the position encoding method.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding (2025)
- A 2D Semantic-Aware Position Encoding for Vision Transformers (2025)
- PaTH Attention: Position Encoding via Accumulating Householder Transformations (2025)
- LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers (2025)
- Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation (2025)
- ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention (2025)
- Theoretical Analysis of Positional Encodings in Transformer Models: Impact on Expressiveness and Generalization (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper