Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
CCMat 's Collections
RL
LoRA
Visual Consistency
ID Preservation
Inference Improvements
Style Transfer
Img-Diffusion
Adapters & Controls
Personalization
Upscaling & SR
Depth & Segmentation
Computer Vision
3D & 360 & World Models
Encoders
Video
Mixture of Experts
Transformers & Attention
StateSpaceModels
MergingModels
LLMs
Virtual TryOn
Audio
Agents
Data
Fast Diffusion
UI
Relighting
toread
VLM

Transformers & Attention

updated Feb 5
Upvote
-

  • Linear Transformers with Learnable Kernel Functions are Better In-Context Models

    Paper • 2402.10644 • Published Feb 16, 2024 • 82

  • Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

    Paper • 2401.04658 • Published Jan 9, 2024 • 28

  • KAN: Kolmogorov-Arnold Networks

    Paper • 2404.19756 • Published Apr 30, 2024 • 113

  • Your Transformer is Secretly Linear

    Paper • 2405.12250 • Published May 19, 2024 • 159

  • Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

    Paper • 2405.12981 • Published May 21, 2024 • 34

  • Block Transformer: Global-to-Local Language Modeling for Fast Inference

    Paper • 2406.02657 • Published Jun 4, 2024 • 41
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs