Sparse Query Attention (SQA) Research

ReactiveAI 's Collections

updated 12 days ago

Experimental models with Sparse Query Attention layers. Reducing training time/cost by ~3-10% compared to GQA & MQA, with the same level performance