TheBloke/Llama-2-13B-chat-GPTQ · Is this model based on `chat` or `chat-hf` model of llama2?

pootow

Jul 19, 2023

Llama2 has 4 kinds of models: Llama2 Llama2-hf Llama2-chat Llama2-chat-hf
Which one is this model based on?

Wildstar50

Jul 20, 2023

From the first line in the Model card: "These files are GPTQ model files for Meta's Llama 2 13B-chat"

Which links to:
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

pootow

Jul 20, 2023

Oh, the information is hidden in the link!

TheBloke

Owner Jul 20, 2023

•

edited Jul 20, 2023

This is 13B Chat, but actually my link is a little wrong. I based this on 13B-Chat not 13B-Chat-HF. I intended to base it on 13B-Chat-HF, because that's in the right format for me to quantise. But when I tried, it failed with a weird quantisation problem.

Ultimately 13B-Chat and 13B-Chat-HF should be identical, besides being in different formats (PTH vs pytorch_model.bin / model.safetensors). But I have found problems using the Meta HF format repos.

So in the end, my quants were made like this:

Download 13B Chat PTH files direct from Meta via their download.sh
Convert to HF myself, using Transformers convert_llama_weights_to_hf.py
Then quantise as usual
I also then uploaded the HF files I converted myself, to my -fp16 repos.

I don't know why their HF files are causing problems, I've yet to investigate that.