CHAI: Clustered Head Attention for Efficient LLM Inference Paper • 2403.08058 • Published Mar 12, 2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference Paper • 2410.00215 • Published Sep 30, 2024
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs Paper • 2505.20309 • Published May 22
any4: Learned 4-bit Numeric Representation for LLMs Paper • 2507.04610 • Published 29 days ago • 6
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models Paper • 2403.00952 • Published Mar 1, 2024
Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation Paper • 2104.09648 • Published Apr 19, 2021
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Paper • 2206.14098 • Published Jun 28, 2022
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks Paper • 2403.04814 • Published Mar 7, 2024 • 1
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 80
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 12
Learning Compiler Pass Orders using Coreset and Normalized Value Prediction Paper • 2301.05104 • Published Jan 9, 2023
Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction Paper • 2110.08232 • Published Oct 15, 2021 • 1
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding Paper • 2401.03003 • Published Jan 5, 2024 • 13
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency Paper • 2303.11525 • Published Mar 21, 2023 • 1
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models Paper • 2303.10464 • Published Mar 18, 2023 • 1