get_cuda_version in core.sageattn_qk_int8_pv_fp8_cuda causes severe slowdown on Windows

#1
by pamparamm - opened

I believe that output of core.get_cuda_version() should be cached - it causes severe (x10 times performance loss compared to sageattention2 from github) slowdown when using sageattn_qk_int8_pv_fp8_cuda

Good point, that’s a reasonable suggestion. I’ll check and update the code accordingly.

Thanks for a fix! After 3ea20a2, everything seems to work as expected

pamparamm changed discussion status to closed

Sign up or log in to comment