Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -18,19 +18,18 @@ tags:
 # Quant Infos
 - quants done with an importance matrix for improved quantization loss
-- quantized & generated imatrix from the f32 as f16 is inaccurate when converting from bf16
 - K & IQ quants in basically all variants from Q6_K down to IQ1_S
-Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [b4e4b8a9351d918a56831c73cf9f25c1837b80d1](https://github.com/ggerganov/llama.cpp/commit/b4e4b8a9351d918a56831c73cf9f25c1837b80d1) (master from 2024-04-24)
-Imatrix dataset was used from [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
-Using this command to generate the importance matrix from the f32.gguf
-```
-./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
-```
 # Original Model Card

 # Quant Infos
+## Includes latest bpe tokenizer fixes 🎉
+- Updated for latest bpe pre-tokenizer fixes https://github.com/ggerganov/llama.cpp/pull/6920
 - quants done with an importance matrix for improved quantization loss
 - K & IQ quants in basically all variants from Q6_K down to IQ1_S
+- fixed end token for instruct mode (<|eot_id|>[128009])
+- Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [f4ab2a41476600a98067a9474ea8f9e6db41bcfa](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) (master from 2024-04-29)
+- Imatrtix generated with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) dataset.
+  ```
+  ./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
+  ```
 # Original Model Card