--- license: apache-2.0 datasets: - EleutherAI/pile --- RWKV-7 trained on the Pile w/ "20b tokenizer" (332115325534 tokens) 0.1B = L12-D768, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096 0.4B = L24-D1024, lr 6e-4 to 2e-5 cosine decay, wd 0.1, bsz 8x30x4096 1.5B = L24-D2048, lr 5e-4 to 1.5e-5 cosine decay, wd 0.1, bsz 8x45x4096 Check https://github.com/BlinkDL/RWKV-LM for details. How to run it: https://pypi.org/project/rwkv/ or https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v7