TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference Paper • 2508.15881 • Published 18 days ago • 8
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference Paper • 2508.15881 • Published 18 days ago • 8 • 2
view article Article SmolLM - blazingly fast and remarkably powerful By loubnabnl and 2 others • Jul 16, 2024 • 412