Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/
Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan
This contains checkpoints every 5k steps for pretraining run 9.8B tokens using
- Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/
- Dataset:
tokyotech-llm/swallow-code-v2 - Model Architecture: Llama 3.2 1B (1 billion parameter)
The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for dudeperf3ct/codellm_pretrain
Base model
dudeperf3ct/codellm-tokenizer