Text Generation
PyTorch
English
gpt2

Question about the necessity of Phase 1 in MiniLLM

#3
by HyeongSoo - opened

I was deeply impressed by your MiniLLM paper and am eager to reproduce some of your experiments.
One question I have is regarding the necessity of Phase 1. As I understand it, you perform supervised fine-tuning (SFT) in Phase 1 and then select the checkpoint with the lowest validation loss as the gpt2-init-model for the next phase. Could you please clarify why this initial Phase 1 is necessary?

Sign up or log in to comment