Quantization support.

#1
by AV99 - opened

Are there any plans of releasing 8bit versions support for this?

Add _no_split_modules = ["CodeT5pBlock"] to class CodeT5pEncoderDecoderModel in modeling_codet5p.py and now device_map="auto" should work. now you can just use bitsandbytes to do 8bit inference, which will let you run this model with a 24gb gpu.
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True, low_cpu_mem_usage=True, trust_remote_code=True)

If you are a windows user you can find a bnb build here: https://github.com/acpopescu/bitsandbytes/releases

Hey Verah, For https://huggingface.co/mosaicml/mpt-7b-instruct where should I add _no_split_modules, and what will be the value?

Thanks in advance.

Are there any plans of releasing 4bit versions support for this? Thanks.

Sign up or log in to comment