How could I deploy liuhaotian/llava-v1.5-7b on a server?
Hi,
I've got llama.cpp working with ggml-model-q4_k.gguf on my notebook.
Now, I'm trying to run:
python3 -m llava.serve.controller --host 0.0.0.0 --port 10000
python3 -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path liuhaotian/llava-v1.5-7b --load-4bit
and getting error:
modeling_utils.py", line 2842, in from_pretrained
2023-11-21 16:41:10 | ERROR | stderr | raise ValueError(
2023-11-21 16:41:10 | ERROR | stderr | ValueError:
2023-11-21 16:41:10 | ERROR | stderr | Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
2023-11-21 16:41:10 | ERROR | stderr | the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
2023-11-21 16:41:10 | ERROR | stderr | these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom
2023-11-21 16:41:10 | ERROR | stderr | device_map
to from_pretrained
. Check
2023-11-21 16:41:10 | ERROR | stderr | https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
2023-11-21 16:41:10 | ERROR | stderr | for more details.
What should I do to run this model with CPU only?
Thanks.
@andreydmitr20 could you please help me out to fine tune this model. could you please let me know when can we connect on this?