TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!
YASH AKHAURI
akhauriyash
AI & ML interests
None yet
Recent Activity
updated
a model
38 minutes ago
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReason_SFT_GRPO_14k
published
a model
42 minutes ago
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT_GRPO_INDUCETEST
updated
a model
about 1 hour ago
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT
Organizations
None yet
Collections
1
models
24
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReason_SFT_GRPO_14k
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT_GRPO_INDUCETEST
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT
Text Generation
•
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_GRPO_14k_v4
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_GRPO_14k_v3
Text Generation
•
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_GRPO_14k_v2
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_GRPO_14k
Updated
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_14k
Text Generation
•
Updated
•
27
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT_GRPO
Updated
•
87
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReasoner_SFT
Text Generation
•
Updated
•
41