akhauriyash/DeepSeek-R1-Distill-Llama-8B-Butler
Feature Extraction
•
Updated
•
35
TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!