Transformers with a novel gating mechanism that skips layers from the middle outward: https://arxiv.org/pdf/2506.21103
-
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 3 -
tim-lawson/skip-middle-fineweb-baseline-2-layers
Text Generation • 0.1B • Updated • 10 -
tim-lawson/skip-middle-fineweb-baseline-4-layers
Text Generation • 0.1B • Updated • 103 -
tim-lawson/skip-middle-fineweb-baseline-6-layers
Text Generation • 0.1B • Updated • 9