RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy Paper • 2412.01129 • Published Dec 2, 2024
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding Paper • 2506.15745 • Published 9 days ago • 11
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization Paper • 2311.05161 • Published Nov 9, 2023
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment Paper • 2407.03051 • Published Jul 3, 2024
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs Paper • 2410.01518 • Published Oct 2, 2024 • 3
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models Paper • 2308.06744 • Published Aug 13, 2023 • 1
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders Paper • 2211.11014 • Published Nov 20, 2022 • 1
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers Paper • 2302.11812 • Published Feb 23, 2023