Tutorial
Please, provide a tutorial or a link how i can integrate or use it.
These are to be used with llama.cpp
https://github.com/ggerganov/llama.cpp
All instructions will be in that repo.
Btw, i came here because Dalai coudn't download alpaca 13B. This link "https://huggingface.co/Pi3141/alpaca-13B-ggml/resolve/main/ggml-model-q4_0.bin'"
Dalai is no longer supported, it's outdated and I won't keep q4_0 on the old version.
Okay thank you. I want to ask you, is it each file in your repo independent of the others?
Yep, they are all independent.
Which one is the best from your opinion? because i'm trying to build chat bot for discord ; p
Q4_0 and Q4_2
I couldn't find how i can use my gpu to make it faster. it gets stuck after running it on docker
You can't use GPU. If you want to use GPU, then you'll need to run it with pytorch. This isn't the right repo for you in that case.
what's the difference ? between the two. i have fairly strong CPU and GPU. what do you recommend
GPU is faster. But you need at least 12GB of VRAM.
Any idea why the hashes for q4_1 changed? Iirc that format has not been modified in ggjt dropped.
Iirc llama.cpp did some changes to the quantization