OOM on RTX 3090 with vLLM

by willowill5 - opened Nov 15, 2023

Nov 15, 2023

Weird bug with this model or vLLM. But the original hf4 model loads fine on 24 GB but OOMs with AWQ / vLLM.

python -m vllm.entrypoints.api_server --model TheBloke/zephyr-7B-alpha-AWQ --quantization awq --dtype float16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment