senfu 's Collections

ToP

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference