VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits Paper • 2505.10202 • Published about 1 month ago
Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective Paper • 2505.16900 • Published 23 days ago
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention Paper • 2505.10222 • Published about 1 month ago
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Paper • 2505.17997 • Published 23 days ago