unable to load 4-bit quantized varient with llama.cpp
#31
by
sunnykusawa
- opened
Getting this error:
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 62.6 GiB for an array with shape (131072, 128256) and data type float32
I am using 4bit quantized LLM so why its expecting 62.6 GiB for an array with shape (131072, 128256) and data type float32
sunnykusawa
changed discussion title from
unable to laod 4-bit quantized varient with llama.cpp
to unable to load 4-bit quantized varient with llama.cpp