Question about the necessity of Phase 1 in MiniLLM
#3
by
HyeongSoo
- opened
I was deeply impressed by your MiniLLM paper and am eager to reproduce some of your experiments.
One question I have is regarding the necessity of Phase 1. As I understand it, you perform supervised fine-tuning (SFT) in Phase 1 and then select the checkpoint with the lowest validation loss as the gpt2-init-model for the next phase. Could you please clarify why this initial Phase 1 is necessary?