IQ1_S or IQ_M for low RAM/VRAM computers

#20

by teneriffa - opened Apr 11

Discussion

teneriffa

Apr 11

•

edited Apr 11

Or if you upload the imatrix.dat, this will be very welcome for poor computers.

MaziyarPanahi

Owner Apr 11

So I tried to do 1-bit, it asked for imatrix data! I have never done that, could you tell me how to do it? I can do the imatrix and share all the 1Q models quickly

venketh

Apr 11

Grab a copy of group_10_merged.txt from https://github.com/ggerganov/llama.cpp/discussions/5263
W/ the f16 gguf file, run: ~/llama.cpp/imatrix -m ggml-model.f16.gguf -f group_10_merged.txt
Wait a while;
When running quantize, add this arg: --imatrix imatrix.dat
\o/
(The quality of all your other low-bit-rate quantizations will improve as well!)

MaziyarPanahi

Owner Apr 11

Grab a copy of group_10_merged.txt from https://github.com/ggerganov/llama.cpp/discussions/5263

W/ the f16 gguf file, run: ~/llama.cpp/imatrix -m ggml-model.f16.gguf -f group_10_merged.txt

Wait a while;

When running quantize, add this arg: --imatrix imatrix.dat
\o/
(The quality of all your other low-bit-rate quantizations will improve as well!)

Will do this in an hour! Thanks a lot! So I do this for IQ1_S and IQ1_M?

venketh

Apr 11

At minimum. The same imatrix.dat file can be used for all quantization levels though - it would be good to remake any of the IQ*'s at minimum, any of the other ones you can!

MaziyarPanahi

Owner Apr 11

At minimum. The same imatrix.dat file can be used for all quantization levels though - it would be good to remake any of the IQ*'s at minimum, any of the other ones you can!

You seem to have more knowledge about this imatrix, is it for all the quantized models starting with IQ regardless of their size? If yes, why isn't it happening automatically inside the quantize script? (just asking out of curiosity)

I still have the 16bit which takes forever to make, I will do the imatrix and start with the 1bits, then see what other IQ I have

teneriffa

Apr 11

•

edited Apr 11

imatrix.dat is effective for quants Q5_K_M or smaller. Even perplexities of Q4_0 or Q3_K_S will be better with imatrix.dat.

MaziyarPanahi

Owner Apr 11

So not the groups_merged.txt, but the group_10_merged.txt?

teneriffa

Apr 11

•

edited Apr 11

So not the groups_merged.txt, but the group_10_merged.txt?

I prefer groups_merged.txt but it’s up to you. Someone uses wiki.train.raw from wikitext and it is very large.

MaziyarPanahi

Owner Apr 11

•

edited Apr 11

OK I'll go with groups_merged.txt which seems to be more diverse

system_info: n_threads = 64 / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
compute_imatrix: tokenizing the input ..
compute_imatrix: tokenization took 110.547 ms
compute_imatrix: computing over 105 chunks with batch_size 512
compute_imatrix: 62.96 seconds per pass - ETA 1 hours 50.17 minutes
[1]2.9595,[2]2.4039,

MaziyarPanahi

Owner Apr 11

I have uploaded both IQ1_S and IQ1_M, the IQ1_M took a long time! I think the imatrix made this one much longer. I'll see if I can evaluate the other quants and see how much difference the imatrix would have

teneriffa

Apr 11

I have uploaded both IQ1_S and IQ1_M, the IQ1_M took a long time! I think the imatrix made this one much longer. I'll see if I can evaluate the other quants and see how much difference the imatrix would have

I really appreciate it! Thank you very much!!!

teneriffa changed discussion status to closed Apr 11

MaziyarPanahi

Owner Apr 11

Thank you for sharing how to do imatrix, appreciate it! :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment