TheBloke commited on
Commit
a30ebd3
1 Parent(s): d5f54d0

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -40
README.md CHANGED
@@ -47,20 +47,17 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
47
 
48
  The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
49
 
50
- As of August 24th 2023, llama.cpp and KoboldCpp support GGUF. Other third-party clients and libraries are expected to add support very soon.
51
-
52
- Here is a list of clients and libraries that are known to support GGUF:
53
- * [llama.cpp](https://github.com/ggerganov/llama.cpp)
54
- * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41!
55
-
56
- Here is a list of clients and libraries, along with their expected timeline for GGUF support. Where possible a link to the relevant issue or PR is provided:
57
- * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), awaiting llama-cpp-python support.
58
- * [LM Studio](https://lmstudio.ai/), in active development - hoped to be ready by August 25th-26th.
59
- * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), will work as soon as ctransformers or llama-cpp-python is updated.
60
- * [ctransformers](https://github.com/marella/ctransformers), [development will start soon](https://github.com/marella/ctransformers/issues/102).
61
- * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [in active development](https://github.com/abetlen/llama-cpp-python/issues/628).
62
- <!-- README_GGUF.md-about-gguf end -->
63
 
 
64
  <!-- repositories-available start -->
65
  ## Repositories available
66
 
@@ -79,6 +76,7 @@ Here is a list of clients and libraries, along with their expected timeline for
79
  {prompt}
80
 
81
  ### Response:
 
82
  ```
83
 
84
  <!-- prompt-template end -->
@@ -87,9 +85,7 @@ Here is a list of clients and libraries, along with their expected timeline for
87
 
88
  These quantised GGUF files are compatible with llama.cpp from August 21st 2023 onwards, as of commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9)
89
 
90
- As of August 24th 2023 they are now compatible with KoboldCpp, release 1.41 and later.
91
-
92
- They are are not yet compatible with any other third-party UIS, libraries or utilities but this is expected to change very soon.
93
 
94
  ## Explanation of quantisation methods
95
  <details>
@@ -111,16 +107,22 @@ Refer to the Provided Files table below to see what files use which methods, and
111
 
112
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
113
  | ---- | ---- | ---- | ---- | ---- | ----- |
114
- | [nous-hermes-llama2-70b.Q2_K.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q2_K.gguf) | Q2_K | 2 | 29.48 GB| 31.98 GB | smallest, significant quality loss - not recommended for most purposes |
115
- | [nous-hermes-llama2-70b.Q3_K_S.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q3_K_S.gguf) | Q3_K_S | 3 | 30.09 GB| 32.59 GB | very small, high quality loss |
116
- | [nous-hermes-llama2-70b.Q3_K_M.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q3_K_M.gguf) | Q3_K_M | 3 | 33.45 GB| 35.95 GB | very small, high quality loss |
117
- | [nous-hermes-llama2-70b.Q3_K_L.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q3_K_L.gguf) | Q3_K_L | 3 | 36.49 GB| 38.99 GB | small, substantial quality loss |
118
- | [nous-hermes-llama2-70b.Q4_K_S.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q4_K_S.gguf) | Q4_K_S | 4 | 39.30 GB| 41.80 GB | small, greater quality loss |
119
- | [nous-hermes-llama2-70b.Q4_K_M.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q4_K_M.gguf) | Q4_K_M | 4 | 41.69 GB| 44.19 GB | medium, balanced quality - recommended |
120
- | [nous-hermes-llama2-70b.Q5_K_S.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q5_K_S.gguf) | Q5_K_S | 5 | 47.74 GB| 50.24 GB | large, low quality loss - recommended |
121
- | [nous-hermes-llama2-70b.Q5_K_M.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q5_K_M.gguf) | Q5_K_M | 5 | 49.03 GB| 51.53 GB | large, very low quality loss - recommended |
122
- | nous-hermes-llama2-70b.Q6_K.bin | q6_K | 6 | 56.82 GB | 59.32 GB | very large, extremely low quality loss |
123
- | nous-hermes-llama2-70b.Q8_0.bin | q8_0 | 8 | 73.29 GB | 75.79 GB | very large, extremely low quality loss - not recommended |
 
 
 
 
 
 
124
 
125
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
126
 
@@ -143,7 +145,7 @@ Please download:
143
 
144
  To join the files, do the following:
145
 
146
- Linux:
147
  ```
148
  cat nous-hermes-llama2-70b.Q6_K.gguf-split-* > nous-hermes-llama2-70b.Q6_K.gguf && rm nous-hermes-llama2-70b.Q6_K.gguf-split-*
149
  cat nous-hermes-llama2-70b.Q8_0.gguf-split-* > nous-hermes-llama2-70b.Q8_0.gguf && rm nous-hermes-llama2-70b.Q8_0.gguf-split-*
@@ -158,9 +160,71 @@ del nous-hermes-llama2-70b.Q8_0.gguf-split-a nous-hermes-llama2-70b.Q8_0.gguf-sp
158
  ```
159
 
160
  </details>
161
-
162
  <!-- README_GGUF.md-provided-files end -->
163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
  <!-- footer start -->
165
  <!-- 200823 -->
166
  ## Discord
@@ -184,7 +248,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
184
 
185
  **Special thanks to**: Aemon Algiz.
186
 
187
- **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
188
 
189
 
190
  Thank you to all my generous patrons and donaters!
@@ -216,16 +280,16 @@ The model was trained almost entirely on synthetic GPT-4 outputs. Curating high
216
  This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), and several others, detailed further below
217
 
218
  ## Collaborators
219
- The model fine-tuning and the datasets were a collaboration of efforts and resources between Teknium, Karan4D, Emozilla, Huemin Art, and Pygmalion AI.
220
-
221
  Special mention goes to @winglian for assisting in some of the training issues.
222
 
223
- Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
224
 
225
  Among the contributors of datasets:
226
  - GPTeacher was made available by Teknium
227
  - Wizard LM by nlpxucan
228
- - Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
229
  - GPT4-LLM and Unnatural Instructions were provided by Microsoft
230
  - Airoboros dataset by jondurbin
231
  - Camel-AI's domain expert datasets are from Camel-AI
@@ -245,7 +309,7 @@ The model follows the Alpaca prompt format:
245
 
246
  ```
247
 
248
- or
249
 
250
  ```
251
  ### Instruction:
@@ -261,7 +325,7 @@ or
261
 
262
  ## Benchmarks:
263
 
264
- GPT4All Suite:
265
 
266
  ```
267
  hf-causal-experimental (pretrained=/home/data/axolotl/Nous-Hermes-Llama2-70b,dtype=float16,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
@@ -306,7 +370,7 @@ hf-causal-experimental (pretrained=/home/data/axolotl/Nous-Hermes-Llama2-70b,dty
306
  |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2168|± |0.0117|
307
  |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1531|± |0.0086|
308
  |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4467|± |0.0288|
309
- ```
310
 
311
  AGIEval:
312
  ```
@@ -329,15 +393,15 @@ hf-causal-experimental (pretrained=/home/data/axolotl/Nous-Hermes-Llama2-70b,dty
329
  | | |acc_norm|0.4709|± |0.0349|
330
  |agieval_sat_math | 0|acc |0.4136|± |0.0333|
331
  | | |acc_norm|0.3455|± |0.0321|
332
- ```
333
 
334
  ## Resources for Applied Use Cases:
335
  Check out LM Studio for a nice chatgpt style interface here: https://lmstudio.ai/
336
- For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
337
- For an example of a roleplaying discord chatbot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
338
 
339
  ## Future Plans
340
- We plan to continue to iterate on both more high quality data, and new data filtering techniques to eliminate lower quality data going forward.
341
 
342
  ## Model Usage
343
  The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
 
47
 
48
  The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
49
 
50
+ Here are a list of clients and libraries that are known to support GGUF:
51
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp).
52
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
53
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
54
+ * [LM Studio](https://lmstudio.ai/), version 0.2.2 and later support GGUF. A fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
55
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
56
+ * [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
57
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
58
+ * [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
 
 
 
 
59
 
60
+ <!-- README_GGUF.md-about-gguf end -->
61
  <!-- repositories-available start -->
62
  ## Repositories available
63
 
 
76
  {prompt}
77
 
78
  ### Response:
79
+
80
  ```
81
 
82
  <!-- prompt-template end -->
 
85
 
86
  These quantised GGUF files are compatible with llama.cpp from August 21st 2023 onwards, as of commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9)
87
 
88
+ They are now also compatible with many third party UIs and libraries - please see the list at the top of the README.
 
 
89
 
90
  ## Explanation of quantisation methods
91
  <details>
 
107
 
108
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
109
  | ---- | ---- | ---- | ---- | ---- | ----- |
110
+ | [nous-hermes-llama2-70b.Q6_K.gguf-split-b](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q6_K.gguf-split-b) | Q6_K | 6 | 19.89 GB| 22.39 GB | very large, extremely low quality loss |
111
+ | [nous-hermes-llama2-70b.Q2_K.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q2_K.gguf) | Q2_K | 2 | 29.28 GB| 31.78 GB | smallest, significant quality loss - not recommended for most purposes |
112
+ | [nous-hermes-llama2-70b.Q3_K_S.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q3_K_S.gguf) | Q3_K_S | 3 | 29.92 GB| 32.42 GB | very small, high quality loss |
113
+ | [nous-hermes-llama2-70b.Q3_K_M.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q3_K_M.gguf) | Q3_K_M | 3 | 33.19 GB| 35.69 GB | very small, high quality loss |
114
+ | [nous-hermes-llama2-70b.Q3_K_L.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q3_K_L.gguf) | Q3_K_L | 3 | 36.15 GB| 38.65 GB | small, substantial quality loss |
115
+ | [nous-hermes-llama2-70b.Q8_0.gguf-split-b](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q8_0.gguf-split-b) | Q8_0 | 8 | 36.59 GB| 39.09 GB | very large, extremely low quality loss - not recommended |
116
+ | [nous-hermes-llama2-70b.Q6_K.gguf-split-a](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q6_K.gguf-split-a) | Q6_K | 6 | 36.70 GB| 39.20 GB | very large, extremely low quality loss |
117
+ | [nous-hermes-llama2-70b.Q8_0.gguf-split-a](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q8_0.gguf-split-a) | Q8_0 | 8 | 36.70 GB| 39.20 GB | very large, extremely low quality loss - not recommended |
118
+ | [nous-hermes-llama2-70b.Q4_0.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q4_0.gguf) | Q4_0 | 4 | 38.87 GB| 41.37 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
119
+ | [nous-hermes-llama2-70b.Q4_K_S.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q4_K_S.gguf) | Q4_K_S | 4 | 39.07 GB| 41.57 GB | small, greater quality loss |
120
+ | [nous-hermes-llama2-70b.Q4_K_M.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q4_K_M.gguf) | Q4_K_M | 4 | 41.42 GB| 43.92 GB | medium, balanced quality - recommended |
121
+ | [nous-hermes-llama2-70b.Q5_0.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q5_0.gguf) | Q5_0 | 5 | 47.46 GB| 49.96 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
122
+ | [nous-hermes-llama2-70b.Q5_K_S.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q5_K_S.gguf) | Q5_K_S | 5 | 47.46 GB| 49.96 GB | large, low quality loss - recommended |
123
+ | [nous-hermes-llama2-70b.Q5_K_M.gguf](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-70B-GGUF/blob/main/nous-hermes-llama2-70b.Q5_K_M.gguf) | Q5_K_M | 5 | 48.75 GB| 51.25 GB | large, very low quality loss - recommended |
124
+ | nous-hermes-llama2-70b.Q6_K.gguf | Q6_K | 6 | 56.59 GB| 59.09 GB | very large, extremely low quality loss |
125
+ | nous-hermes-llama2-70b.Q8_0.gguf | Q8_0 | 8 | 73.29 GB| 75.79 GB | very large, extremely low quality loss - not recommended |
126
 
127
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
128
 
 
145
 
146
  To join the files, do the following:
147
 
148
+ Linux and macOS:
149
  ```
150
  cat nous-hermes-llama2-70b.Q6_K.gguf-split-* > nous-hermes-llama2-70b.Q6_K.gguf && rm nous-hermes-llama2-70b.Q6_K.gguf-split-*
151
  cat nous-hermes-llama2-70b.Q8_0.gguf-split-* > nous-hermes-llama2-70b.Q8_0.gguf && rm nous-hermes-llama2-70b.Q8_0.gguf-split-*
 
160
  ```
161
 
162
  </details>
 
163
  <!-- README_GGUF.md-provided-files end -->
164
 
165
+ <!-- README_GGUF.md-how-to-run start -->
166
+ ## Example `llama.cpp` command
167
+
168
+ Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9) or later.
169
+
170
+ For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
171
+
172
+ ```
173
+ ./main -t 10 -ngl 32 -m nous-hermes-llama2-70b.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction:\n\nWrite a story about llamas\n\n### Response:"
174
+ ```
175
+ Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
176
+
177
+ Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
178
+
179
+ Change `-c 4096` to the desired sequence length for this model. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
180
+
181
+ If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
182
+
183
+ For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
184
+
185
+ ## How to run in `text-generation-webui`
186
+
187
+ Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
188
+
189
+ ## How to run from Python code
190
+
191
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
192
+
193
+ ### How to load this model from Python using ctransformers
194
+
195
+ #### First install the package
196
+
197
+ ```bash
198
+ # Base ctransformers with no GPU acceleration
199
+ pip install ctransformers>=0.2.24
200
+ # Or with CUDA GPU acceleration
201
+ pip install ctransformers[cuda]>=0.2.24
202
+ # Or with ROCm GPU acceleration
203
+ CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
204
+ # Or with Metal GPU acceleration for macOS systems
205
+ CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
206
+ ```
207
+
208
+ #### Simple example code to load one of these GGUF models
209
+
210
+ ```python
211
+ from ctransformers import AutoModelForCausalLM
212
+
213
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
214
+ llm = AutoModelForCausalLM.from_pretrained("TheBloke/Nous-Hermes-Llama2-70B-GGUF", model_file="nous-hermes-llama2-70b.q4_K_M.gguf", model_type="llama", gpu_layers=50)
215
+
216
+ print(llm("AI is going to"))
217
+ ```
218
+
219
+ ## How to use with LangChain
220
+
221
+ Here's guides on using llama-cpp-python or ctransformers with LangChain:
222
+
223
+ * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
224
+ * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
225
+
226
+ <!-- README_GGUF.md-how-to-run end -->
227
+
228
  <!-- footer start -->
229
  <!-- 200823 -->
230
  ## Discord
 
248
 
249
  **Special thanks to**: Aemon Algiz.
250
 
251
+ **Patreon special mentions**: Russ Johnson, J, alfie_i, Alex, NimbleBox.ai, Chadd, Mandus, Nikolai Manek, Ken Nordquist, ya boyyy, Illia Dulskyi, Viktor Bowallius, vamX, Iucharbius, zynix, Magnesian, Clay Pascal, Pierre Kircher, Enrico Ros, Tony Hughes, Elle, Andrey, knownsqashed, Deep Realms, Jerry Meng, Lone Striker, Derek Yates, Pyrater, Mesiah Bishop, James Bentley, Femi Adebogun, Brandon Frisco, SuperWojo, Alps Aficionado, Michael Dempsey, Vitor Caleffi, Will Dee, Edmond Seymore, usrbinkat, LangChain4j, Kacper Wikieł, Luke Pendergrass, John Detwiler, theTransient, Nathan LeClaire, Tiffany J. Kim, biorpg, Eugene Pentland, Stanislav Ovsiannikov, Fred von Graf, terasurfer, Kalila, Dan Guido, Nitin Borwankar, 阿明, Ai Maven, John Villwock, Gabriel Puliatti, Stephen Murray, Asp the Wyvern, danny, Chris Smitley, ReadyPlayerEmma, S_X, Daniel P. Andersen, Olakabola, Jeffrey Morgan, Imad Khwaja, Caitlyn Gatomon, webtim, Alicia Loh, Trenton Dambrowitz, Swaroop Kallakuri, Erik Bjäreholt, Leonard Tan, Spiking Neurons AB, Luke @flexchar, Ajan Kanaga, Thomas Belote, Deo Leter, RoA, Willem Michiel, transmissions 11, subjectnull, Matthew Berman, Joseph William Delisle, David Ziegler, Michael Davis, Johann-Peter Hartmann, Talal Aujan, senxiiz, Artur Olbinski, Rainer Wilmers, Spencer Kim, Fen Risland, Cap'n Zoog, Rishabh Srivastava, Michael Levine, Geoffrey Montalvo, Sean Connelly, Alexandros Triantafyllidis, Pieter, Gabriel Tamborski, Sam, Subspace Studios, Junyu Yang, Pedro Madruga, Vadim, Cory Kujawski, K, Raven Klaugh, Randy H, Mano Prime, Sebastain Graf, Space Cruiser
252
 
253
 
254
  Thank you to all my generous patrons and donaters!
 
280
  This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), and several others, detailed further below
281
 
282
  ## Collaborators
283
+ The model fine-tuning and the datasets were a collaboration of efforts and resources between Teknium, Karan4D, Emozilla, Huemin Art, and Pygmalion AI.
284
+
285
  Special mention goes to @winglian for assisting in some of the training issues.
286
 
287
+ Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
288
 
289
  Among the contributors of datasets:
290
  - GPTeacher was made available by Teknium
291
  - Wizard LM by nlpxucan
292
+ - Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
293
  - GPT4-LLM and Unnatural Instructions were provided by Microsoft
294
  - Airoboros dataset by jondurbin
295
  - Camel-AI's domain expert datasets are from Camel-AI
 
309
 
310
  ```
311
 
312
+ or
313
 
314
  ```
315
  ### Instruction:
 
325
 
326
  ## Benchmarks:
327
 
328
+ GPT4All Suite:
329
 
330
  ```
331
  hf-causal-experimental (pretrained=/home/data/axolotl/Nous-Hermes-Llama2-70b,dtype=float16,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
 
370
  |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2168|± |0.0117|
371
  |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1531|± |0.0086|
372
  |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4467|± |0.0288|
373
+ ```
374
 
375
  AGIEval:
376
  ```
 
393
  | | |acc_norm|0.4709|± |0.0349|
394
  |agieval_sat_math | 0|acc |0.4136|± |0.0333|
395
  | | |acc_norm|0.3455|± |0.0321|
396
+ ```
397
 
398
  ## Resources for Applied Use Cases:
399
  Check out LM Studio for a nice chatgpt style interface here: https://lmstudio.ai/
400
+ For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
401
+ For an example of a roleplaying discord chatbot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
402
 
403
  ## Future Plans
404
+ We plan to continue to iterate on both more high quality data, and new data filtering techniques to eliminate lower quality data going forward.
405
 
406
  ## Model Usage
407
  The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.