training time
Hey I have one single question to you. please do answer π
i saw that the gpu you are using is 4060Ti 16GB
for how many epochs have you run this model and what was your training time ?
Hi there, SampadKar!
Unfortunately I cant remember. What I do remember is that I used llama pro script from LLama Factory to extend the layers over several stages. Each stage had a theme like science and code, and because of the memory intensive full fine tune I think I extended the model with 2 new layers per stage. Since I made this a lot of new optimizations has been published and implemented in various frameworks.
I can't remember how long it took, but I guess it was a week or so with 2 day runs for each stage. Not counting all the failures of course.
I hope this helps a little.