Possible to run with 24GB VRAM?

#14

by happyTonakai - opened 8 days ago

Discussion

happyTonakai

8 days ago

Anyone managed to run it with 24GB VRAM? Using 4bit quantization?

VivekMalipatel23

8 days ago

Did you quantize it on you own? We have no official quantizations yet for this model. Even If we have AWQ, or 4 bit we need a cluster of 2 24GBs to run it.

CptnPrice

8 days ago

Its a 3b per expert model, so you can run it in a common cpu and 32 gb of RAM with not much delay.

VivekMalipatel23

8 days ago

Good Point! Can you do that on vLLM?

CptnPrice

7 days ago

Absolutely, I always use transformers, but I dont see why not.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment