alvarobartt
/

UltraCM-13B-GGUF

@@ -27,7 +27,8 @@ LLM outputs on helpfulness, truthfulness, honesty, and to what extent the answer
 UltraCM-13B is a 13b param LLM that was released by [OpenBMB](https://huggingface.co/openbmb), as part of their paper
 [UltraFeedback: Boosting Language Models with High-quality Feedback](https://arxiv.org/abs/2310.01377).
-This model contains the quantized variants using the GGUF format, introduced by the [llama.cpp](https://github.com/ggerganov/llama.cpp) team.
 ### Model Details
@@ -45,13 +46,18 @@ This model contains the quantized variants using the GGUF format, introduced by
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
 | [UltraCM-13b.q4_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_s.gguf) | Q4_K_S | 4 | 7.41 GB| 9.91 GB | small, greater quality loss |
-| [UltraCM-13b.q4_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b.GGUF/blob/main/UltraCM-13b.q4_k_m.gguf) | Q4_K_M | 4 | 7.87 GB| 10.37 GB | medium, balanced quality - recommended |
-| [UltraCM-13b.q5_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b.GGUF/blob/main/UltraCM-13b.q5_k_s.gguf) | Q5_K_S | 5 | 8.97 GB| 11.47 GB | large, low quality loss - recommended |
-| [UltraCM-13b.q5_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b.GGUF/blob/main/UltraCM-13b.q5_k_m.gguf) | Q5_K_M | 5 | 9.23 GB| 11.73 GB | large, very low quality loss - recommended |
 **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 ### Uses
 #### Direct Use

 UltraCM-13B is a 13b param LLM that was released by [OpenBMB](https://huggingface.co/openbmb), as part of their paper
 [UltraFeedback: Boosting Language Models with High-quality Feedback](https://arxiv.org/abs/2310.01377).
+This model contains the quantized variants using the GGUF format, introduced by the [llama.cpp](https://github.com/ggerganov/llama.cpp) team,
+and also heavily inspired by [TheBloke](https://huggingface.co/TheBloke) work on quantizing most of the LLMs out there.
 ### Model Details
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
+| [UltraCM-13b.q4_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_0.gguf) | Q4_0 | 4 | 3.83 GB| 6.33 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
 | [UltraCM-13b.q4_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_s.gguf) | Q4_K_S | 4 | 7.41 GB| 9.91 GB | small, greater quality loss |
+| [UltraCM-13b.q4_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_m.gguf) | Q4_K_M | 4 | 7.87 GB| 10.37 GB | medium, balanced quality - recommended |
+| [UltraCM-13b.q5_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_0.gguf) | Q5_0 | 5 | 4.65 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
+| [UltraCM-13b.q5_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_s.gguf) | Q5_K_S | 5 | 8.97 GB| 11.47 GB | large, low quality loss - recommended |
+| [UltraCM-13b.q5_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_m.gguf) | Q5_K_M | 5 | 9.23 GB| 11.73 GB | large, very low quality loss - recommended |
 **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
+For more information on quantization, I'd highly suggest anyone reading to go check [TheBloke](https://huggingface.co/TheBloke) out, as well as joining [their
+Discord server](https://discord.gg/Jq4vkcDakD).
 ### Uses
 #### Direct Use