vllm
#10
by
regzhang
- opened
Can the VLLM inference framework support running inference with this model? How can it be adjusted or modified to run on a setup with 8 Nvidia RTX 3090 GPUs?
it is mixtral architecture supported by vllm, but I have no idea how to setup with 8 Nvidia RTX 3090 GPUs.