In my use case, significantly worse than the 14-DPO variant
#4
by
cmp-nct
- opened
Just wanted to share some feedback. I was testing the new variant and compared it to the little DPO trained 14B brother.
It produced significantly worse results, less well written, less precisely followed on instructions.
The task was summarization in good language of a structured input based on a set of instructions.
There are likely other tasks where this model will be better but at this point I'd not choose it
This comment has been hidden