vllm

#10

by regzhang - opened Jan 18, 2024

Jan 18, 2024

Can the VLLM inference framework support running inference with this model? How can it be adjusted or modified to run on a setup with 8 Nvidia RTX 3090 GPUs?

cloudyu

Owner Jan 18, 2024

it is mixtral architecture supported by vllm, but I have no idea how to setup with 8 Nvidia RTX 3090 GPUs.

neofung

Jan 25, 2024

i think you are looking for this https://github.com/vllm-project/vllm/pull/2293

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment