Sweaterdog/Smol-reason2
Updated
•
815
My first ever usage of GRPO fine tuning techniques, information learned from this model will be used on future Andy models.
Note Datasets for the Smol-reason family