vllm
#10
by
regzhang
- opened
Can the VLLM inference framework support running inference with this model? How can it be adjusted or modified to run on a setup with 8 Nvidia RTX 3090 GPUs?
it is mixtral architecture supported by vllm, but I have no idea how to setup with 8 Nvidia RTX 3090 GPUs.
i think you are looking for this https://github.com/vllm-project/vllm/pull/2293