What are folks opinion on 4KM quants? Are they viable?

#3
by Permahuman - opened

Question in discussion title. I really want the whole model to fit on a 3090 without offloading to ram to get those legendary token generation speeds. I have heard that quality may be low below 6 or 5 quants. Has anyone tried a 4KM quant yet? I have a really bad rural internet connection so would really appreciate some feedback.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment