ToastyPigeon's picture
Update README.md
703f22b verified

This is an experimental depth-upscale of Qwen2.5 14B to a total of 21.4B parameters. A total of 24 layers were added (layers 30-41 inclusive each repeated twice) bringing the total to 72 layers.

The added layers had the o_proj and down_proj modules zeroed out prior to retraining as seen in other modern depth upscaling experiments.

The upscaled model was then trained on a mix of about 10M tokens worth of instruct and creative data, with the majority being general instruct training to try to repair those connections.