pythia-*-vanilla are fine-tuned on 10M sequences from the pile using AdamW. pythia-*-myopic are fine-tuned on the same using myopic descent.