Malaysian Reasoning Collection Full parameter post training using SFT warmup and GRPO. โข 6 items โข Updated Jun 24 โข 1
MaLLaM ๐ Collection Pretrain from scratch 4096 context length on 90B tokens Malaysian text, https://huggingface.co/papers/2401.14680 โข 10 items โข Updated Jun 24 โข 15