view post Post 219 Is it time to start developing sparse attention again?https://github.com/SmallDoges/flash-sparse-attention See translation
Article 7 Trainable Dynamic Mask Sparse Attention: Bridging Efficiency and Effectiveness in Long-Context Language Models
Doge Doge family of small language models. SmallDoge/Doge-320M-Instruct Question Answering • 0.3B • Updated Aug 8 • 45 • 4 SmallDoge/Doge-160M-Instruct Question Answering • 0.2B • Updated Aug 8 • 191 • 12 SmallDoge/Doge-60M-Instruct Question Answering • 54.6M • Updated Aug 8 • 99 • 6 SmallDoge/Doge-20M-Instruct Question Answering • 13.1M • Updated Apr 17 • 31 • 5
Doge Doge family of small language models. SmallDoge/Doge-320M-Instruct Question Answering • 0.3B • Updated Aug 8 • 45 • 4 SmallDoge/Doge-160M-Instruct Question Answering • 0.2B • Updated Aug 8 • 191 • 12 SmallDoge/Doge-60M-Instruct Question Answering • 54.6M • Updated Aug 8 • 99 • 6 SmallDoge/Doge-20M-Instruct Question Answering • 13.1M • Updated Apr 17 • 31 • 5