general.architecture = 'llama' in .gguf metadata
#6
by
mattjcly
- opened
Hi, I have a question regarding what I'm seeing in the GGUF metadata (across various GGUF preview tools like https://github.com/ggerganov/llama.cpp/blob/4e96a812b3ce7322a29a3008db2ed73d9087b176/gguf-py/scripts/gguf-dump.py, https://netron.app/, LM Studio). It appears that the general.architecture = 'llama' and the general.name = 'LLaMA v2':
python3 gguf-dump.py Phi-3-mini-4k-instruct-q4.gguf
* Loading: Phi-3-mini-4k-instruct-q4.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 28 key/value pair(s)
1: UINT32 | 1 | GGUF.version = 3
2: UINT64 | 1 | GGUF.tensor_count = 291
3: UINT64 | 1 | GGUF.kv_count = 25
4: STRING | 1 | general.architecture = 'llama'
5: STRING | 1 | general.name = 'LLaMA v2'
6: UINT32 | 1 | llama.vocab_size = 32064
7: UINT32 | 1 | llama.context_length = 4096
8: UINT32 | 1 | llama.embedding_length = 3072
9: UINT32 | 1 | llama.block_count = 32
10: UINT32 | 1 | llama.feed_forward_length = 8192
11: UINT32 | 1 | llama.rope.dimension_count = 96
12: UINT32 | 1 | llama.attention.head_count = 32
13: UINT32 | 1 | llama.attention.head_count_kv = 32
14: FLOAT32 | 1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
15: FLOAT32 | 1 | llama.rope.freq_base = 10000.0
16: UINT32 | 1 | general.file_type = 15
17: STRING | 1 | tokenizer.ggml.model = 'llama'
18: [STRING] | 32064 | tokenizer.ggml.tokens
19: [FLOAT32] | 32064 | tokenizer.ggml.scores
20: [INT32] | 32064 | tokenizer.ggml.token_type
21: UINT32 | 1 | tokenizer.ggml.bos_token_id = 1
22: UINT32 | 1 | tokenizer.ggml.eos_token_id = 32000
23: UINT32 | 1 | tokenizer.ggml.unknown_token_id = 0
24: UINT32 | 1 | tokenizer.ggml.padding_token_id = 32000
25: BOOL | 1 | tokenizer.ggml.add_bos_token = True
26: BOOL | 1 | tokenizer.ggml.add_eos_token = False
27: STRING | 1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{{'<|' + message"
28: UINT32 | 1 | general.quantization_version = 2
Is this intentional or a bug?
Phi-2 from https://huggingface.co/TheBloke/phi-2-GGUF shows:
4: STRING | 1 | general.architecture = 'phi2'
5: STRING | 1 | general.name = 'Phi2'
6: UINT32 | 1 | phi2.context_length = 2048
7: UINT32 | 1 | phi2.embedding_length = 2560
8: UINT32 | 1 | phi2.feed_forward_length = 10240
9: UINT32 | 1 | phi2.block_count = 32
10: UINT32 | 1 | phi2.attention.head_count = 32
11: UINT32 | 1 | phi2.attention.head_count_kv = 32
12: FLOAT32 | 1 | phi2.attention.layer_norm_epsilon = 9.999999747378752e-06
13: UINT32 | 1 | phi2.rope.dimension_count = 32
That's because we are still waiting https://github.com/abetlen/llama-cpp-python to add support for Phi-3. So we ended up using the "Llama" conversion script to avoid breaking up the GGUF use cases.
gugarosa
changed discussion status to
closed