Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

App Files Files Community

135

Split/shard support

#65

by SixOpen - opened May 18

base: refs/heads/main

←

from: refs/pr/65

Discussion Files changed

+143

-61

SixOpen

May 18

No description provided.

Split/shard support78ee58d5

reach-vb

ggml.ai org May 20

Nice! thanks for the PR - this looks very comprehensive!

cc: @phymbert - would you like to give this a review from the split functionality PoV

reach-vb

ggml.ai org May 20

Note to self: Merge this too: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/64/files

SixOpen

May 23

Gladly! Update: there's also support for i-quants virtually ready though it'd benefit a few more additions/fallbacks to handle some gotchas before submitting: https://huggingface.co/spaces/SixOpen/gguf-my-repo-sp_imat/tree/main I have access to Zero but it doesn't seem to support docker SDK so can't load any layers to GPU, though works regardless (in few hours given the free space CPU, granted that is no issue on the original space)
Result Just split result using the space mirroring this PR

reach-vb changed pull request status to merged May 24

reach-vb

ggml.ai org May 24

Hey @SixOpen - thanks a lot for this! I'm merging this now. Can you open a new PR with matrix support? 🤗

SixOpen

May 24

Awesome :) of course, will do so!

reach-vb

ggml.ai org May 30

Hey hey! @SixOpen - Just double checking are you still planning on opening a PR for iMatrix support! I think it could be quite cool to add! 🤗

SixOpen

May 31

Sure thing! 😄 I might do it this weekend, though there's an impediment regarding the Dockerfile and putting the GPU to work which I haven't been able to figure a fix that follows best practices yet (other than through Dev Mode), and haven't been able to replicate it locally either. Will look into it in a bit! :)

reach-vb

ggml.ai org May 31

Interesting, I think in your start.sh you should have LLAMA_CUDA=1 make -j quantize gguf-split imatrix so that it compiles with CUDA support.
Feel free to email me at vaibhav [at] huggingface [dot] co or message on twitter if you want to chat more about this, happy to help debug issues with you.

SixOpen

May 31

Sounds good let's do that then :) looked at the latest commits here by the way, they're cool stuff! I was using that cuda flag indeed, but good news: turns out the space itself was the issue, after moving to a new one everything is working well- Another odd thing similar to this that happened in another iteration of the space is that in spite of factory rebuilding, once the auth expires it results into an endless redirect loop

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment