nxnhjrjtbjfzhrovwl's picture
Update README.md (#1)
fa1e8e0
|
raw
history blame contribute delete
No virus
944 Bytes
metadata
'[object Object]': null
license: agpl-3.0

This repository contains the unquantized Hermes+LIMARP merge in ggml format.

You can quantize the f16 ggml to the quantization of your choice by following the below steps:

  1. Download and extract the llama.cpp binaries (or compile it yourself if you're on Linux)
  2. Move the "quantize" executable to the same folder where you downloaded the f16 ggml model.
  3. Open a command prompt window in that same folder and write the following command, making the changes that you see fit.
quantize.exe hermes-limarp-13b.ggmlv3.f16.bin hermes-limarp-13b.ggmlv3.q4_0.bin q4_0
  1. Press enter to run the command and the quantized model will be generated in the folder.