'use_cache: false' reduces tokens/sec significantly

#10
by Astris - opened

I saw a 3x reduction in tokens/sec with cache being disabled,compared to enabled. I don't know why it was disabled, but considering the difference it might be beneficial to have it enabled by default. I used the huggingface loader in text-generation-webui, and ran the model on a 3090.

The config.json file has use_cache: True already set. When I loaded this up in textgen, it stayed set to true. Is there anything special about your setup?

To clarify, I only fixed this yesterday. (I kept forgetting)

Oh! I should have looked at my local copy when I commented, I see that my cache was set to false. Got a nice little speed increase, not 3x, but from 7it/s to 11it/s on a 4090. Thanks metaprotium, wouldn't have known unless you posted. And thanks for the model Gryphe, it's seriously awesome.

Astris changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment