Cannot run model with torch.float16
I cannot run this model with torch.float16. And, the load speed is slower.
I ran the following code:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name="llm-jp/llm-jp-13b-instruct-full-jaster-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
text = 'θͺηΆθ¨θͺε¦ηγ¨γ―δ½γ'
text = text + "### εηοΌ"
tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
tokenized_input,
max_new_tokens=256,
do_sample=True,
top_p=0.95,
temperature=0.7,
)[0]
print(tokenizer.decode(output))
Then, I got the following error:
/home/username/anaconda3/envs/pdf-agent/bin/python /mnt/my_raid/github/pdf-agent/llm_jp_13b.py
Loading checkpoint shards: 100%|ββββββββββ| 3/3 [00:08<00:00, 2.79s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:7 for open-end generation.
Traceback (most recent call last):
File "/mnt/my_raid/github/pdf-agent/llm_jp_13b.py", line 10, in <module>
output = model.generate(
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate
return self.sample(
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample
outputs = self(
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1076, in forward
transformer_outputs = self.transformer(
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward
outputs = block(
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 389, in forward
hidden_states = self.ln_1(hidden_states)
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
But, I can run this model with torch.float32.
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
And, I can run this model with load_in_8bit.
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True)
My environment is as follows:
- OS: Ubuntu 20.04
- GPU: Core i5-12400F
- GPU: RTX A6000 48GBx2
- Memory: 128GB
- Pip freeze:
- accelerate==0.23.0
- bitsandbytes==0.41.1
- tokenizers==0.14.1
- transformers==4.34.1
- torch==2.0.1
What should I do?
OK. I got it.
I run the code on CPU with torch.float16.
For example, according to Mr. Sasaki, I should change the code into the following code:
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16,device_map="auto")
The code run on GPU.
The problem is solved.
In your environment, you need to add an argument device_map="auto"
for AutoModelForCausalLM.from_pretrained()
and set the os environmental variable CUDA_VISIBLE_DEVICES=0
.
By the way, which version of python are you using for this environment?
Thank you for your reply.
My python is Python 3.10.13 with anaconda3.