Observations+benchmarks

#1
by ChuckMcSneed - opened
  • Seems to be working at 17k, but loses minor details just like Aurelian.
  • It works significantly worse with Alpaca format compared to original Goliath.
  • At short context, it has problems with formatting.
  • Overall performance is worse than original Goliath, as expected.

image.png

The trend of losing ~30% of SP on my meme benchmark after adding 32k context continues even here.

If anyone else has made benchmarks, please post them.

Thanks again.

I'm experimenting with a bit of fine-tuning to try and get the 32K models closer to the 4K performance, will upload a CP when it is done.

Does one of your scores capture this loss of minor details, or is it an anecdotal observation?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment