trollek/NinjaMouse-3B-40L-danube

Hi there, SampadKar!
Unfortunately I cant remember. What I do remember is that I used llama pro script from LLama Factory to extend the layers over several stages. Each stage had a theme like science and code, and because of the memory intensive full fine tune I think I extended the model with 2 new layers per stage. Since I made this a lot of new optimizations has been published and implemented in various frameworks.
I can't remember how long it took, but I guess it was a week or so with 2 day runs for each stage. Not counting all the failures of course.
I hope this helps a little.

trollek
/

NinjaMouse-3B-40L-danube

training time