starble-dev
/

Mistral-Nemo-12B-Instruct-2407-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

starble-dev commited on Jul 20

Commit

f9b858a

•

1 Parent(s): 693dbfe

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -34,11 +34,11 @@ Use:
 llama-server.exe -m .\models\Mistral-Nemo-12B-Instruct-2407-Q8_0.gguf -b 512 -ub 512 -c 4096 -ngl 100
 ```
-Set `-b` to batch size
-Set `-ub` to physical batch size
-Set `-c` to context size
-Set `-ngl` to amount of layers to load onto GPU
-Change the path to where the model is actually stored.
 If you need more clarification on parameters check out the [llama.cpp Server Docs](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)
 **License:**

 llama-server.exe -m .\models\Mistral-Nemo-12B-Instruct-2407-Q8_0.gguf -b 512 -ub 512 -c 4096 -ngl 100
 ```
+Set `-b` to batch size<br>
+Set `-ub` to physical batch size<br>
+Set `-c` to context size<br>
+Set `-ngl` to amount of layers to load onto GPU<br>
+Change the path to where the model is actually stored. <br>
 If you need more clarification on parameters check out the [llama.cpp Server Docs](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)
 **License:**