view article Article Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models By wubingheng and 2 others • Aug 5 • 6
🧐Small-Papers Collection Technical support for the SmallDoges series models. • 2 items • Updated Aug 5 • 2
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Paper • 2412.11834 • Published Dec 16, 2024 • 8