Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,8 @@ Please note that these GGMLs are **not compatible with llama.cpp, or currently w
|
|
30 |
|
31 |
## Prompt template: orca
|
32 |
|
33 |
-
|
|
|
34 |
|
35 |
<human>: {prompt}
|
36 |
|
@@ -66,7 +67,6 @@ As other options become available I will endeavour to update them here (do let m
|
|
66 |
| mpt-30b-dolphin-v2.ggmlv1.q5_1.bin | q5_1 | 5 | 22.47 GB| 24.97 GB | 5-bit. Even higher accuracy, resource usage and slower inference. |
|
67 |
| mpt-30b-dolphin-v2.ggmlv1.q8_0.bin | q8_0 | 8 | 31.83 GB| 34.33 GB | 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
|
68 |
|
69 |
-
|
70 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
71 |
|
72 |
<!-- footer start -->
|
|
|
30 |
|
31 |
## Prompt template: orca
|
32 |
|
33 |
+
```
|
34 |
+
<system>: You are a helpful assistant
|
35 |
|
36 |
<human>: {prompt}
|
37 |
|
|
|
67 |
| mpt-30b-dolphin-v2.ggmlv1.q5_1.bin | q5_1 | 5 | 22.47 GB| 24.97 GB | 5-bit. Even higher accuracy, resource usage and slower inference. |
|
68 |
| mpt-30b-dolphin-v2.ggmlv1.q8_0.bin | q8_0 | 8 | 31.83 GB| 34.33 GB | 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
|
69 |
|
|
|
70 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
71 |
|
72 |
<!-- footer start -->
|