LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation Paper • 2501.12976 • Published Jan 22
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21 • 48
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models Paper • 2411.06839 • Published Nov 11, 2024 • 1
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities Paper • 2212.06385 • Published Dec 13, 2022
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer Paper • 2304.05659 • Published Apr 12, 2023
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast Paper • 2405.14507 • Published May 23, 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models Paper • 2404.02657 • Published Apr 3, 2024
Weight-Inherited Distillation for Task-Agnostic BERT Compression Paper • 2305.09098 • Published May 16, 2023