webpolis
/

zenos-gpt-j-6B-instruct-4bit

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

webpolis commited on Oct 19, 2023

Commit

353e28a

•

1 Parent(s): aa29592

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -83,7 +83,18 @@ Currently, the HuggingFace's Inference Tool UI doesn't properly load the model.
 ## CPU
-Check-out [webpolis/zenos-gpt-j-6B-instruct-cpu](https://huggingface.co/webpolis/zenos-gpt-j-6B-instruct-cpu)
 # Acknowledgments

 ## CPU
+Best performance can be achieved downloading the [GGML 4 bits](https://huggingface.co/webpolis/zenos-gpt-j-6B-instruct-4bit/resolve/main/ggml-f16-q4_0.bin) model and doing inference with the [rustformers' llm](https://github.com/rustformers/llm) tool.
+In my Core i7 laptop it goes around 255ms per token:
+![](https://huggingface.co/webpolis/zenos-gpt-j-6B-instruct-4bit/resolve/main/poema1.gif)
+### Requirements
+For optimal performance:
+- 4 CPU cores
+- 8GB RAM
 # Acknowledgments