10086 14 222

Tien Dung

tiendung

tiendung

AI & ML interests

None yet

Recent Activity

liked a model 27 days ago

SparseLLM/BlockFFN-3B-SFT

liked a model about 1 month ago

turboderp/ERNIE-4.5-300B-A47B-PT-exl3

reacted to Jaward's post with 😎 about 1 month ago

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕 It just works (well I had to add some guardrails) but still saves 5% of memory usage: The Patch: - Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod) - Handles arbitrary sequence lengths by padding to the nearest multiple of 4. - An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step. - Uses 5% less ops Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb Paper: https://arxiv.org/pdf/2505.09814

View all activity

Organizations

liked a model 27 days ago

SparseLLM/BlockFFN-3B-SFT

Text Generation • Updated 28 days ago • 18 • 1

liked a model about 1 month ago

turboderp/ERNIE-4.5-300B-A47B-PT-exl3

Updated 29 days ago • 18 • 3

reacted to Jaward's post with 😎👍 about 1 month ago

Post

2041

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814