Commit
•
bec663b
1
Parent(s):
15d745b
Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,8 @@ LLM outputs on helpfulness, truthfulness, honesty, and to what extent the answer
|
|
27 |
UltraCM-13B is a 13b param LLM that was released by [OpenBMB](https://huggingface.co/openbmb), as part of their paper
|
28 |
[UltraFeedback: Boosting Language Models with High-quality Feedback](https://arxiv.org/abs/2310.01377).
|
29 |
|
30 |
-
This model contains the quantized variants using the GGUF format, introduced by the [llama.cpp](https://github.com/ggerganov/llama.cpp) team
|
|
|
31 |
|
32 |
### Model Details
|
33 |
|
@@ -45,13 +46,18 @@ This model contains the quantized variants using the GGUF format, introduced by
|
|
45 |
|
46 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
47 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
|
|
48 |
| [UltraCM-13b.q4_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_s.gguf) | Q4_K_S | 4 | 7.41 GB| 9.91 GB | small, greater quality loss |
|
49 |
-
| [UltraCM-13b.q4_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b
|
50 |
-
| [UltraCM-13b.
|
51 |
-
| [UltraCM-13b.
|
|
|
52 |
|
53 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
54 |
|
|
|
|
|
|
|
55 |
### Uses
|
56 |
|
57 |
#### Direct Use
|
|
|
27 |
UltraCM-13B is a 13b param LLM that was released by [OpenBMB](https://huggingface.co/openbmb), as part of their paper
|
28 |
[UltraFeedback: Boosting Language Models with High-quality Feedback](https://arxiv.org/abs/2310.01377).
|
29 |
|
30 |
+
This model contains the quantized variants using the GGUF format, introduced by the [llama.cpp](https://github.com/ggerganov/llama.cpp) team,
|
31 |
+
and also heavily inspired by [TheBloke](https://huggingface.co/TheBloke) work on quantizing most of the LLMs out there.
|
32 |
|
33 |
### Model Details
|
34 |
|
|
|
46 |
|
47 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
48 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
49 |
+
| [UltraCM-13b.q4_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_0.gguf) | Q4_0 | 4 | 3.83 GB| 6.33 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
|
50 |
| [UltraCM-13b.q4_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_s.gguf) | Q4_K_S | 4 | 7.41 GB| 9.91 GB | small, greater quality loss |
|
51 |
+
| [UltraCM-13b.q4_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_m.gguf) | Q4_K_M | 4 | 7.87 GB| 10.37 GB | medium, balanced quality - recommended |
|
52 |
+
| [UltraCM-13b.q5_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_0.gguf) | Q5_0 | 5 | 4.65 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
|
53 |
+
| [UltraCM-13b.q5_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_s.gguf) | Q5_K_S | 5 | 8.97 GB| 11.47 GB | large, low quality loss - recommended |
|
54 |
+
| [UltraCM-13b.q5_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_m.gguf) | Q5_K_M | 5 | 9.23 GB| 11.73 GB | large, very low quality loss - recommended |
|
55 |
|
56 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
57 |
|
58 |
+
For more information on quantization, I'd highly suggest anyone reading to go check [TheBloke](https://huggingface.co/TheBloke) out, as well as joining [their
|
59 |
+
Discord server](https://discord.gg/Jq4vkcDakD).
|
60 |
+
|
61 |
### Uses
|
62 |
|
63 |
#### Direct Use
|