view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels By drbh and 1 other • 23 days ago • 57
Unifying Demonstration Selection and Compression for In-Context Learning Paper • 2405.17062 • Published May 27, 2024 • 1
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling Paper • 2504.07053 • Published Apr 9 • 4
view reply Does Liger Kernel affect training speed at all? Is it faster, slower, or no difference compared to regular GRPO?
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14 • 97
Tiny Series Collection Tiny datasets that empower the foundation of Small Language Model! • 11 items • Updated Jan 26, 2024 • 42