brucethemoose
commited on
Commit
•
6210430
1
Parent(s):
7ea62e3
Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ It might recognize ChatML, and possibly Alpaca-like formats. Raw prompting as de
|
|
26 |
|
27 |
|
28 |
## Running
|
29 |
-
Being a Yi model, run a lower temperature with 0.
|
30 |
|
31 |
24GB GPUs can efficiently run Yi-34B-200K models at **40K-90K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/). 16GB GPUs can still run the high context with aggressive quantization.
|
32 |
|
|
|
26 |
|
27 |
|
28 |
## Running
|
29 |
+
Being a Yi model, run a lower temperature with 0.1 or higher MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull Yi's huge vocabulary. See the explanation here: https://github.com/ggerganov/llama.cpp/pull/3841
|
30 |
|
31 |
24GB GPUs can efficiently run Yi-34B-200K models at **40K-90K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/). 16GB GPUs can still run the high context with aggressive quantization.
|
32 |
|