can it work on single 3090?

by hamaadtahiir - opened Apr 1

Discussion

hamaadtahiir

Apr 1

What are the vram requirements and can I run it on single 3090 with vllm?

jwlarocque

Apr 2

It can be made to fit, but it's right at the limit - you'll need to do basically everything you can to reduce memory usage.

I'm running on an A10G (24GB) with PYTORCH_CUDA_ALLOC_CONF=expendable_segments:True and max_model_length=4800, enforce_eager=True, gpu_memory_utilization=0.98, kv_cache_dtype="fp8". This works but isn't practical for production use - it's slow, prone to OOM, and you have to really scrimp with your tokens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment