VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model Paper • 2505.03739 • Published 5 days ago • 8 • 1