invalid magic number: latest release of llama.cpp cannot import 13B GGML q4.0 model

#14

by zenitica - opened Aug 23, 2023

Aug 23, 2023

executing this command:

.\build\bin\Release\main.exe -m ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

gets the output with the error msg:

main: build = 1018 (8e4364f)
main: seed = 1692754983
ggml_init_cublas: found 3 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6
Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6
Device 2: NVIDIA GeForce RTX 3080, compute capability 8.6
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from ./models/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/llama-2-13b-chat.ggmlv3.q4_0.bin'
main: error: unable to load model

zenitica changed discussion title from latest release cannot import 13B GGML q4.0 model to latest release of llama.cpp cannot import 13B GGML q4.0 model Aug 23, 2023

zenitica

Aug 23, 2023

rolling back llama.cpp to commit hash a113689 works

TheBloke

Owner Aug 23, 2023

•

edited Aug 23, 2023

Yeah, latest llama.cpp is no longer compatible with GGML models. The new model format, GGUF, was merged recently. As far as llama.cpp is concerned, GGML is now dead - though of course many third-party clients/libraries are likely to continue to support it for a lot longer. I need to update my GGML READMEs to mention this and will be doing this shortly.

I will be providing GGUF models for all my repos in the next 2-3 days. I'm waiting for another PR to merge, which will add improved k-quant quantisation formats.

For now, if you want to use llama.cpp you will need to downgrade it back to commit dadbed99e65252d79f81101a392d0d6497b86caa or earlier. Or use one of the llama.cpp binary releases from before GGUF was merged. Or use a third party client like KoboldCpp, LM Studio, text-generation-webui, etc.

Look out for new -GGUF repos from me in the coming days. Or yes, you can convert them yourself using the script ggml_to_gguf.py now provided with llama.cpp.

zenitica changed discussion title from latest release of llama.cpp cannot import 13B GGML q4.0 model to invalid magic number: latest release of llama.cpp cannot import 13B GGML q4.0 model Aug 25, 2023

Buck3tHead

Aug 25, 2023

I see, good to know. Was also getting similar. Thank you.

SaidTorres3

Aug 25, 2023

Why didn't they mention that SUPER IMPORTANT INFORMATION in the readme.md?!

TheBloke

Owner Aug 25, 2023

They kind of do:

But it's the kind of message that you probably won't register unless you already know what it means..

(Unless you meant me, in which case I've not yet updated all my pre-existing GGML repos since the launch of GGUF, but will be starting that process tomorrow, as well as providing GGUF versions for most of the existing GGML repos.)

SaidTorres3

Aug 25, 2023

•

edited Aug 25, 2023

In theirs Readme.md tutorial are still using GGML without any warning that it doesn't work anymore.

RichardScottOZ

Aug 27, 2023

Thanks for the exact commit tip.

snowsayer

Sep 10, 2023

For those interested and coming from https://replicate.com/blog/run-llama-locally, some notes:

the command to run is not ggml_to_gguf.py but convert-llama-ggml-to-gguf.py
You will need python3 and the numpy libraries. You can install numpy using pip3 install numpy
the exact command should be something like this: ./convert-llama-ggml-to-gguf.py --eps 1e-5 -i ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -o ./models/llama-2-13b-chat.ggmlv3.q4_0.gguf.bin

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment