Transformers
English
falcon
TheBloke commited on
Commit
d11efd9
1 Parent(s): 303874e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -39,7 +39,7 @@ The can be used with a new fork of llama.cpp that adds Falcon GGML support: [cmp
39
  <!-- compatibility_ggml start -->
40
  ## Compatibility
41
 
42
- To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please follow the following steps:
43
 
44
  ```
45
  git clone https://github.com/cmp-nct/ggllm.cpp
@@ -48,12 +48,16 @@ git checkout cuda-integration
48
  rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
49
  ```
50
 
51
- You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
 
 
52
  ```
53
- bin/falcon_main -t 1 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
54
  ```
55
 
56
- As with llama.cpp, if you can fully offload the model to VRAM you should use `-t 1` for maximum performance. If not, use more threads, eg `-t 8`.
 
 
57
 
58
  <!-- compatibility_ggml end -->
59
 
 
39
  <!-- compatibility_ggml start -->
40
  ## Compatibility
41
 
42
+ To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please try the following steps:
43
 
44
  ```
45
  git clone https://github.com/cmp-nct/ggllm.cpp
 
48
  rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
49
  ```
50
 
51
+ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VScode. When compiling with CUDA support using the Microsoft compiler it's essential to select the "Community edition build tools". Otherwise CUDA won't compile.'
52
+
53
+ Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
54
  ```
55
+ bin/falcon_main -t 8 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
56
  ```
57
 
58
+ Using `-ngl 100` will offload all layers to GPU. If you do not have enough VRAM for this, either lower the number or try a smaller quant size as otherwise performance will be severely affected.
59
+
60
+ Adjust `-t 8` according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
61
 
62
  <!-- compatibility_ggml end -->
63