nomic-ai/gpt4all-j · Empty response

Hello

I'm facing a very odd issue while running the following code:

#################
tokenizer = AutoTokenizer.from_pretrained(afs_path+"nomic-ai/gpt4all-j")
model = AutoModelForCausalLM.from_pretrained(afs_path+"nomic-ai/gpt4all-j", torch_dtype=torch.float16, revision="v1.2-jazzy")
model = model.to('cuda:0')

prompt = f"""The task is to write a summary of the follwing text: {input}"""
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
input_length = inputs.input_ids.shape[1]
outputs = model.generate(
**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
)
token = outputs.sequences[0, input_length:]
output_str = tokenizer.decode(token)
print(output_str)
#################

Specifically, the cell is executed successfully but the response is empty ("Setting pad_token_id to eos_token_id:50256 for open-end generation.
<|endoftext|>").

The prompt statement generates 714 tokens which is much less than the max token of 2048 for this model. In addition, the cell sometimes provides a reasonable output while executing multiple times (but totally on a random basis). This issue does not appear with a relatively low amount of input tokens (approx 250 or less). I have played with the parameters of "model.generate" but the issue remains same. I have also checked that GPU is NOT out of memory.

Do you have an idea what the root cause is? Thank you