invalid magic number: latest release of llama.cpp cannot import 13B GGML q4.0 model
executing this command:
.\build\bin\Release\main.exe -m ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
gets the output with the error msg:
main: build = 1018 (8e4364f)
main: seed = 1692754983
ggml_init_cublas: found 3 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6
Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6
Device 2: NVIDIA GeForce RTX 3080, compute capability 8.6
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from ./models/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/llama-2-13b-chat.ggmlv3.q4_0.bin'
main: error: unable to load model
rolling back llama.cpp to commit hash a113689 works
Yeah, latest llama.cpp is no longer compatible with GGML models. The new model format, GGUF, was merged recently. As far as llama.cpp is concerned, GGML is now dead - though of course many third-party clients/libraries are likely to continue to support it for a lot longer. I need to update my GGML READMEs to mention this and will be doing this shortly.
I will be providing GGUF models for all my repos in the next 2-3 days. I'm waiting for another PR to merge, which will add improved k-quant quantisation formats.
For now, if you want to use llama.cpp you will need to downgrade it back to commit dadbed99e65252d79f81101a392d0d6497b86caa
or earlier. Or use one of the llama.cpp binary releases from before GGUF was merged. Or use a third party client like KoboldCpp, LM Studio, text-generation-webui, etc.
Look out for new -GGUF
repos from me in the coming days. Or yes, you can convert them yourself using the script ggml_to_gguf.py
now provided with llama.cpp.
I see, good to know. Was also getting similar. Thank you.
Why didn't they mention that SUPER IMPORTANT INFORMATION in the readme.md?!
They kind of do:
But it's the kind of message that you probably won't register unless you already know what it means..
(Unless you meant me, in which case I've not yet updated all my pre-existing GGML repos since the launch of GGUF, but will be starting that process tomorrow, as well as providing GGUF versions for most of the existing GGML repos.)
In theirs Readme.md tutorial are still using GGML without any warning that it doesn't work anymore.
Thanks for the exact commit tip.
For those interested and coming from https://replicate.com/blog/run-llama-locally, some notes:
- the command to run is not
ggml_to_gguf.py
butconvert-llama-ggml-to-gguf.py
- You will need
python3
and thenumpy
libraries. You can installnumpy
usingpip3 install numpy
- the exact command should be something like this:
./convert-llama-ggml-to-gguf.py --eps 1e-5 -i ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -o ./models/llama-2-13b-chat.ggmlv3.q4_0.gguf.bin