OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators Paper • 2312.09411 • Published Dec 15, 2023
DistiLLM: Towards Streamlined Distillation for Large Language Models Paper • 2402.03898 • Published Feb 6, 2024 • 1
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration Paper • 2407.01425 • Published Jul 1, 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning Paper • 2409.09085 • Published Sep 11, 2024
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 3 days ago • 26
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 3 days ago • 26
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published 3 days ago • 26 • 2
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 33
Only Train Once: A One-Shot Neural Network Training And Pruning Framework Paper • 2107.07467 • Published Jul 15, 2021
DREAM: Diffusion Rectification and Estimation-Adaptive Models Paper • 2312.00210 • Published Nov 30, 2023 • 17