view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention sirluk • Oct 7, 2024 • 71
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 411
view reply In your bitsandbytes config, why are you decompressing the weights to torch.float32, when the native format of phi3 is torch.bfloat16? This seems like a waste of memory