ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper โข 2501.15570 โข Published Jan 26 โข 23
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper โข 2410.10812 โข Published Oct 14, 2024 โข 17
Addition is All You Need for Energy-efficient Language Models Paper โข 2410.00907 โข Published Oct 1, 2024 โข 145
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning Paper โข 2407.18248 โข Published Jul 25, 2024 โข 32