Anyone able to get this working on koboldcpp?

#2
by lemon07r - opened

Crashes when I try to load the model, same issue with quantfactory's quant of this model too. Maybe koboldcpp doesnt have the required upstream merges from llamacpp yet? Wondering if someone can confirm. I tested lost ruins koboldcpp with openblas and vulkan both, and yellowroses hipblas fork, neither can load this model. Tested with Q4k_M

Yes, I'm getting the same crash. We'll have to wait until KoboldCPP updates this change:

https://github.com/LostRuins/koboldcpp/commit/889bdd76866ea31a7625ec2dcea63ff469f3e981

If you build it from source code, you can use the "concedo_experimental" branch. As of now, it has PR #7063 from upstream which is the new tokenizer.

Thanks for looking into this, yeah another one of those "update your backend" changes

Crashes when I try to load the model, same issue with quantfactory's quant of this model too. Maybe koboldcpp doesnt have the required upstream merges from llamacpp yet? Wondering if someone can confirm. I tested lost ruins koboldcpp with openblas and vulkan both, and yellowroses hipblas fork, neither can load this model. Tested with Q4k_M

Hey just tested with the latest Kobold release, working great!:

https://github.com/LostRuins/koboldcpp/releases

Thanks for looking into this, yeah another one of those "update your backend" changes

Working great in the latest kcpp release. The iquant versions will work for cpu only inference but wont work for me when I do any sort of gpu offloading, clblas. vulkan or otherwise on my 6900 xt. I tried iq4 and iq3 quants, they work with clblast but not vulkan when I try to offload any amount.

EDIT - Here's the last message I see on screen before it crashes:

GGML_ASSERT: ggml-vulkan.cpp:2940: !qx_needs_dequant || to_fp16_vk_0 != nullptr

That's expected, you can see the support table here:

https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

Sign up or log in to comment