view article Article mmBERT: ModernBERT goes Multilingual By orionweller and 5 others โข 26 days ago โข 105
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper โข 2502.11089 โข Published Feb 16 โข 165
Retentive Network: A Successor to Transformer for Large Language Models Paper โข 2307.08621 โข Published Jul 17, 2023 โข 172