Edit model card

Pretrain-Qwen-200M

paper | code

Pretrain-Qwen-200M is a 200M model with QWen achitecture conventionally pre-trained from scratch on the Pile for 50B tokens.

We also open-source the tokenized pre-training corpus for reproducibility.

It is used as the baseline for MiniLLM-Qwen-200M

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Other Baselines

Citation

@article{miniplm,
    title={MiniPLM: Knowledge Distillation for Pre-Training Language Models}, 
    author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
    journal={arXiv preprint arXiv:2410.17215},
    year={2024}
}
Downloads last month
56
Safetensors
Model size
203M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train MiniLLM/Pretrain-Qwen-200M