vllm

#10

by regzhang - opened Jan 18

Jan 18

Can the VLLM inference framework support running inference with this model? How can it be adjusted or modified to run on a setup with 8 Nvidia RTX 3090 GPUs?

cloudyu

Owner Jan 18

it is mixtral architecture supported by vllm, but I have no idea how to setup with 8 Nvidia RTX 3090 GPUs.

neofung

Jan 25

i think you are looking for this https://github.com/vllm-project/vllm/pull/2293

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment