TheBloke
/

wizardLM-7B-HF

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

TheBloke commited on Apr 27, 2023

Commit

cbd46f2

•

1 Parent(s): 5c59d97

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -8,6 +8,12 @@ These files are the result of merging the [delta weights](https://huggingface.co
 The code for merging is provided in the [WizardLM official Github repo](https://github.com/nlpxucan/WizardLM).
 ## WizardLM-7B HF
 This repo contains the full unquantised model files in HF format for GPU inference and as a base for quantisation/conversion.

 The code for merging is provided in the [WizardLM official Github repo](https://github.com/nlpxucan/WizardLM).
+The original WizardLM deltas are in float32, and this results in producing an HF repo that is also float32, and so much larger than a normal 7B Llama model.
+Therefore for this repo I converted the model to float16, to produce a standard size 7B model.
+This was done achieved by running **`model = model.half()`** prior to saving.
 ## WizardLM-7B HF
 This repo contains the full unquantised model files in HF format for GPU inference and as a base for quantisation/conversion.