ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1

Thank you for creating this model. I am keen to try it out. I attempted to convert it to GGUF for use with llama.cpp but it appears that the tokenizer has been customized, or the reference to the tokenizer is lost somewhere during the conversion process. Can you please confirm:

Which tokenizer should be used?
Was it customized at all?

When attempting to convert this model to GGUF for use with llama.cpp the following commands are run:

# Download the model
huggingface-cli download ContactDoctor/Bio-Medical-MultiModal-Llama-3-8B-V1 --local-dir $PWD/models/Bio-Medical-MultiModal-Llama-3-8B-V1 --local-dir-use-symlinks False 
# Cache the base model
huggingface-cli download openbmb/MiniCPM-Llama3-V-2_5 
# Run conversion
python3 ./examples/llava/minicpmv-surgery.py -m $PWD/models/Bio-Medical-MultiModal-Llama-3-8B-V1
python3 ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m $PWD/models/Bio-Medical-MultiModal-Llama-3-8B-V1     --minicpmv-projector $PWD/models/Bio-Medical-MultiModal-Llama-3-8B-V1/minicpmv.projector     --output-dir $PWD/models/Bio-Medical-MultiModal-Llama-3-8B-V1/     --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5
python3 ./convert_hf_to_gguf.py ./models/Bio-Medical-MultiModal-Llama-3-8B-V1/model

which produces the error:

INFO:hf-to-gguf:Set meta model                                                                                                                    
INFO:hf-to-gguf:Set model parameters                                                                                                              
INFO:hf-to-gguf:gguf: context length = 8192                                                                                                       
INFO:hf-to-gguf:gguf: embedding length = 4096                                                                                                     
INFO:hf-to-gguf:gguf: feed forward length = 14336                                                                                                 
INFO:hf-to-gguf:gguf: head count = 32                                                                                                             
INFO:hf-to-gguf:gguf: key-value head count = 8                                                                                                    
INFO:hf-to-gguf:gguf: rope theta = 500000.0                                                                                                       
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05                                                                                                    
INFO:hf-to-gguf:gguf: file type = 1                                                                                                               
INFO:hf-to-gguf:Set model tokenizer                                                                                                               
The repository for /home/user/src/llama.cpp/models/Bio-Medical-MultiModal-Llama-3-8B-V1/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//home/user/src/llama.cpp/models/Bio-Medical-MultiModal-Llama-3-8B-V1/
model.                                                                                                                                            
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.                                                             
                                                                                                                                                  
Do you wish to run the custom code? [y/N] y                                                                                                       
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                             
WARNING:hf-to-gguf:                                                                                                                               
                                                                                                                                                  
WARNING:hf-to-gguf:**************************************************************************************                                         
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!                                                                          
WARNING:hf-to-gguf:**          There are 2 possible reasons for this:                                                                             
WARNING:hf-to-gguf:**          - the model has not been added to convert_hf_to_gguf_update.py yet                                                 
WARNING:hf-to-gguf:**          - the pre-tokenization config has changed upstream                                                                 
WARNING:hf-to-gguf:**          Check your model files and convert_hf_to_gguf_update.py and update them accordingly.                               
WARNING:hf-to-gguf:** ref:     https://github.com/ggerganov/llama.cpp/pull/6920                                                                   
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh:  1baddeb572cd9de2a6d36f2ad0c361490bf5447dafca20afbac625e9d37f18a5
WARNING:hf-to-gguf:**************************************************************************************                                         
WARNING:hf-to-gguf:


Traceback (most recent call last):
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 1469, in set_vocab
    self._set_vocab_sentencepiece()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 692, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 709, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /home/user/src/llama.cpp/models/Bio-Medical-MultiModal-Llama-3-8B-V1/model/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 1472, in set_vocab
    self._set_vocab_llama_hf()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 784, in _set_vocab_llama_hf
    vocab = gguf.LlamaHfVocab(self.dir_model)
  File "/home/user/src/llama.cpp/gguf-py/gguf/vocab.py", line 368, in __init__
    raise FileNotFoundError('Cannot find Llama BPE tokenizer')
FileNotFoundError: Cannot find Llama BPE tokenizer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 4067, in <module>
    main()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 4061, in main
    model_instance.write()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 391, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 384, in prepare_metadata
    self.set_vocab()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 1475, in set_vocab
    self._set_vocab_gpt2()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 628, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 472, in get_vocab_base
    tokpre = self.get_vocab_base_pre(tokenizer)
  File "/home/user/src/llama.cpp/./convert_hf_to_gguf.py", line 619, in get_vocab_base_pre
    raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

ContactDoctor
/

Bio-Medical-MultiModal-Llama-3-8B-V1

Converting to GGUF