Salesforce/instructcodet5p-16b · Quantization support.

AV99

May 21, 2023

•

edited May 21, 2023

Are there any plans of releasing 8bit versions support for this?

Verah

May 25, 2023

Add _no_split_modules = ["CodeT5pBlock"] to class CodeT5pEncoderDecoderModel in modeling_codet5p.py and now device_map="auto" should work. now you can just use bitsandbytes to do 8bit inference, which will let you run this model with a 24gb gpu.
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True, low_cpu_mem_usage=True, trust_remote_code=True)

If you are a windows user you can find a bnb build here: https://github.com/acpopescu/bitsandbytes/releases

thechashi

May 31, 2023

Hey Verah, For https://huggingface.co/mosaicml/mpt-7b-instruct where should I add _no_split_modules, and what will be the value?

Thanks in advance.

sanshi2023

Jun 11, 2023

Are there any plans of releasing 4bit versions support for this? Thanks.