get_cuda_version in core.sageattn_qk_int8_pv_fp8_cuda causes severe slowdown on Windows
#1
by
pamparamm
- opened
I believe that output of core.get_cuda_version()
should be cached - it causes severe (x10 times performance loss compared to sageattention2 from github) slowdown when using sageattn_qk_int8_pv_fp8_cuda
Good point, that’s a reasonable suggestion. I’ll check and update the code accordingly.
pamparamm
changed discussion status to
closed