lamma cpp ht to gguf not working

#2
by RameshRajamani - opened

INFO:hf-to-gguf:Loading model: 046a9891f7a2b94706a0ba5c1b93c6c835000f15

issue.jpg
ERROR:hf-to-gguf:Model LlavaForConditionalGeneration is not supported. Is there any way that I can convert this to gguf ...I wanted to create a API locally.

RameshRajamani changed discussion title from python convert_hf_to_gguf.py models--mistral-community--pixtral-12b\snapshots\046a9891f7a2b94706a0ba5c1b93c6c835000f15 --outfile pixtral.gguf to lamma cpp ht to gguf not working

Transformers doesn't even support Pixtral yet, so it stands to reason that llama.cpp doesn't either.

If you need to run locally right now, you can use MistralAI's official repository with vLLM.

I imagine the reason you wanted to convert to GGUF to begin with is to quantize it; vLLM can quantize automatically at load time.

Unofficial Mistral Community org

Transformers does support pixtral, not sure I understand your comment https://github.com/huggingface/transformers/pull/33449 🤗

In the dev version sure, but not the release version that llama.cpp points to. I could've been more specific about that.

To be more precise:

"llama.cpp doesn't support Pixtral in any meaningful capacity yet, and the version of Transformers that llama.cpp's conversion script depends on is too early. Once Transformers v45 is fully released, we can then expect llama.cpp to implement support for Pixtral, but likely not sooner."

the model is a llava based model ....
sO to make the gguf you need to extract the components to individuals ie the encoder needs to be extracted ro mmc.bin etc and thene the model can be gguf... the model will be in two parts one will be the gguf and the other the bin , so by loading them both ( or keepig in the same folder ) with lm-studio it will det3ect it is a ision model and load both components and offer you the multimodal textbox!
its not an easy procedure ! ... the llama cpp people need to implement the vison model into the LLAMA _CPP before it will work correctly !
the next atep which is a problem is also the Saving the Quantization !

so if the model can be saved as 4bit you will be able to enjoy the same service as a gguf but you will use bitsandbytes and transformers to load it ... but it will be reduced memeory ! for the hardware so it will stil be worthy ro run !

Sign up or log in to comment