This model is purely for experimental purposes. Fine tuned on finetome, pinkchat-sft, pinkchat-dpo, the model is able to generate text which makes sense.

Additional fine-tuning is needed.

The model does not perform well, yet it does work. It has been fine tuned on 2 billion tokens of mostly syntetic data and some human made data in the sft process.

Phase 0: In mergekit, we remove 16 layers (out of 28, so 12 layers left: Pinkstackorg/Qwen2.5-3Bprunebase-1M) using passthrough.

Phase 1a: Fine tuning the model on a limited amount of data, lora 16 (21% trained). This phase is to get the model started on generating some sense, mainly for healing the model and nothing else, very low quality text would be generated.

Phase 1b: Fine tuning the model on a bigger amount of data, lora of 64(2.75% trained, due to removing lm_head, embed_tokens from the target_modules.) and a high sequence length on the same dataset (finetome) as phase 1a, would make the model much better at all tasks, but the model is still not able to generate proper high quality text, better than 1a though.

Phase 2: Fine tuning the model on a special dataset with synthetic generations, human text, code generations, math generations, some qwq generations for advanced reasoning. phase 2 makes the model be able to generate higher quality text, but has some issues, we use a low sequence length for only knowledge distillation, thus the model falls into loops sometimes when trying to generate long text. it is useable.

Phase 3: DPO on our Pinkstack/Pinkchat-dpo-19k-en dataset, on a higher sequence length, this phase is highly important, it makes the model safer, have better alignment and follow prompts better. the model has better performance and loops less but is still not great.

Phase 3 was done inside of google colab, other phases were run locally.

Uploaded model

Developed by: Pinkstack
License: apache-2.0
Finetuned from model : Pinkstack/qwen2.5-3b-1m-sft-phase2-max96-lowloss

This qwen2 model was trained with Unsloth and Huggingface's TRL library.

Pinkstackorg
/

PinkQwen2.5-3B-1M-DPO-preview

Additional fine-tuning is needed.

Uploaded model

Model tree for Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview