Question about the necessity of Phase 1 in MiniLLM

by HyeongSoo - opened Jul 7

Jul 7

I was deeply impressed by your MiniLLM paper and am eager to reproduce some of your experiments.
One question I have is regarding the necessity of Phase 1. As I understand it, you perform supervised fine-tuning (SFT) in Phase 1 and then select the checkpoint with the lowest validation loss as the gpt2-init-model for the next phase. Could you please clarify why this initial Phase 1 is necessary?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment