after sft,model inference problem [probability tensor contains either `inf`, `nan` or element < 0]
Traceback (most recent call last):
File "/mnt/bn/intelligent-chatbot/FastChat_v2/LLaMA-Efficient-Tuning/src/inference.py", line 504, in
main(args)
File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/bn/intelligent-chatbot/FastChat_v2/LLaMA-Efficient-Tuning/src/inference.py", line 380, in main
outputs_tokenized = model.generate(**prompts_tokenized, do_sample=True,max_new_tokens=512,pad_token_id=tokenizer.eos_token_id,temperature=0.3)
File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/tiger/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
File "/home/tiger/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 2734, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf
, nan
or element < 0
Hi @Saicy
How do you run inference? It seems you are using Llama-factory. Can you elaborate on how you're getting that error? Is it after fine-tuning?
Yes,after I finish the fine-tune.I can't run infernece.
with frame of acclerate
outputs_tokenized = model.generate(**prompts_tokenized, do_sample=True,max_new_tokens=512,pad_token_id=tokenizer.eos_token_id,temperature=0.3)
outputs_tokenized=[tok_out[len(tok_in):] for tok_in, tok_out in zip(prompts_tokenized["input_ids"], outputs_tokenized) ]
outputs=tokenizer.batch_decode(outputs_tokenized,skip_special_tokens=True)