General discussion.

#1
by Lewdiculous - opened

Call me clueless but I swore there were at least some general prebuilt executables for Linux in the regular llama.cpp releases, well, so it's all MacOS and Windows. My day is ruined.

Owner

Can't blame them, too much overhead when people who use linux should already know how to build their own packages.

So I didn't want to believe this...

After some testing, making the actual quants is really slow, recommended to only use it for the intial FP16 GGUF and imatrix.dat generation.

...because I was thinking that "it can't be that bad".

If anyone also thought that, well...

It actually is very slow. I don't want to imagine what quanting the new smaller stuff like IQ3/2 would look like. I used free Colab but I don't think that would scale.

But!

It's really not a bad solution if you need to generate the Imatrix data and don't have the hardware for it. That is pretty fast as it's GPU-bound.

got this error running the notebook

image.png

Owner

@Marcus-Arcadius

The script is broken because of upstream changes.
I don't have time to fix it, at the moment.


This is not a good way to do it.

  • Colab has limited storage space for GPU instances
  • Colab only has 2 CPU cores

Recommended to do it locally or on another cloud provider. (paid colab isn't great)

@Marcus-Arcadius

The script is broken because of upstream changes.
I don't have time to fix it, at the moment.


This is not a good way to do it.

  • Colab has limited storage space for GPU instances
  • Colab only has 2 CPU cores

Recommended to do it locally or on another cloud provider. (paid colab isn't great)

I probably do it locally but I've got to figure how to do it πŸ˜…

Owner

If you are on windows give https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script a try.

I am on Linux

Owner

@Marcus-Arcadius

You will have to compile llamacpp from source.

I was gone for a couple of months, I am also unsure how to do it now.
Some of the build flags changed and I can't get it to compile with cuda. (most likely due to me being on arch)

When I get it working, I will see if I can add support for linux to the script.
It will take a while however, I am not a coder, and I don't have much free time.

@Marcus-Arcadius

You will have to compile llamacpp from source.

I was gone for a couple of months, I am also unsure how to do it now.
Some of the build flags changed and I can't get it to compile with cuda. (most likely due to me being on arch)

When I get it working, I will see if I can add support for linux to the script.
It will take a while however, I am not a coder, and I don't have much free time.

I'm also on arch which looks like we are in the same dilemma πŸ˜‚

Owner

I think it is either gcc or the new 555 nvidia drivers :|

If I figure it out I'll let you know

got this error running the notebook

image.png

@Marcus-Arcadius

They have changed the naming for a few things, I did some changes that reflected that in the Windows script, you should be good to start there as reference, convert script changed to underlines instead of hifens, the executables received a llama- prefix:

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/commit/234f95c659ecf10213bf0bb51344d098943dc641

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/commit/8d0a75b62cae4261ed71cd4dbdc396fc444b053b

I think Llama.cpp now provides pre-built Linux binaries? They are tagged ubuntu so I'm imagining they are expected to be used for servers using it... I'm not too familiar with the Linux side of things or about the broader compatibility situation across the variations, my experience is basically just Ubuntu server side.

Owner

@Marcus-Arcadius

I finally got time to figure out the issue

You need to set your cuda architecture

Example

make -j 16 GGML_CUDA=1 CUDA_POWER_ARCH=75
Owner

@Marcus-Arcadius

Never mind you just need to run make -j 8 GGML_CUDA=1

I added Linux support to the script
https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/discussions/36

Sign up or log in to comment