Transformers & Attention - a CCMat Collection

CCMat 's Collections

RL

LoRA

Visual Consistency

ID Preservation

Inference Improvements

Adapters & Controls

Personalization

Depth & Segmentation

Computer Vision

3D & 360 & World Models

Video

Mixture of Experts

Transformers & Attention

StateSpaceModels

LLMs

Audio

Agents

Data

UI

toread

VLM

Transformers & Attention

updated Feb 5

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 82
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 28
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 113
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 159
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 34
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41