Is this model based on `chat` or `chat-hf` model of llama2?
Llama2 has 4 kinds of models: Llama2
Llama2-hf
Llama2-chat
Llama2-chat-hf
Which one is this model based on?
From the first line in the Model card: "These files are GPTQ model files for Meta's Llama 2 13B-chat"
Which links to:
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf
Oh, the information is hidden in the link!
This is 13B Chat, but actually my link is a little wrong. I based this on 13B-Chat not 13B-Chat-HF. I intended to base it on 13B-Chat-HF, because that's in the right format for me to quantise. But when I tried, it failed with a weird quantisation problem.
Ultimately 13B-Chat and 13B-Chat-HF should be identical, besides being in different formats (PTH vs pytorch_model.bin / model.safetensors). But I have found problems using the Meta HF format repos.
So in the end, my quants were made like this:
- Download 13B Chat PTH files direct from Meta via their download.sh
- Convert to HF myself, using Transformers
convert_llama_weights_to_hf.py
- Then quantise as usual
- I also then uploaded the HF files I converted myself, to my -fp16 repos.
I don't know why their HF files are causing problems, I've yet to investigate that.