Post
228
Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗
Trainable Dynamic Mask Sparse Attention (2508.02124)
Trainable Dynamic Mask Sparse Attention (2508.02124)