Is the snippet in README loading the model in 8-bit mode?

by mahmoudajawad - opened Jun 19, 2023

Jun 19, 2023

Passing load_in_8bit=True fails. Does this mean the model runs in 8-bit mode without the need to pass the argument? What dtype should be?

ichitaka

Owner Jun 19, 2023

What is the issue that you are facing?

viktor-ferenczi

Jun 29, 2023

Could you please provide a script to load this model in plain Python?

It failed to load into oobabooga/text-generation-webui for CPU inference:
RuntimeError: No GPU found. A GPU is needed for quantization.

ichitaka

Owner Jun 29, 2023

bitsandbytes 8-Bit Quantization requires a GPU that can hold the whole model, it is not compatible with CPU inference.

SaffalPoosh

Jul 4, 2023

the script you provided in README is loading model in full precision, do we need to pass load_in_8bit ? As of now, it downloads full checkpoints when loaded using .from_pretrained ( .... )

ichitaka

Owner Jul 5, 2023

The weight files are only 40+ GB instead of 90+ GB for the full precision mode. I usually pass load_in_8bit when loading as well but it should be impossible to load the original weights using this repo.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment