Incomplete Output even with max_new_tokens
#26
by
vermanic
- opened
So the output of my model ends abruptly and I ideally want it to complete the paragraph/sentences/code which it was it between of.
Although I have provided max_new_tokens = 300 and also in prompt I give to limit by 300 words.
The response is always big and ends abruptly. Any way I can ask for a complete output within desired number of output tokens?
Code:
checkpoint = "HuggingFaceH4/starchat-alpha"
device = "cuda" if torch.cuda.is_available() else "cpu" # "cuda:X" for GPU usage or "cpu" for CPU usage
class StarCoderModel:
def __init__(self):
print("Running in " + device)
self.tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# make sure `--gpus all` is provided in docker run command if gpu is required
self.model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map='auto')
def infer(self, input_text, token_count):
print(input_text)
print(token_count)
inputs = self.tokenizer.encode(input_text, return_tensors="pt").to(device)
print(len(self.tokenizer.tokenize(input_text)))
outputs = self.model.generate(inputs, max_new_tokens=token_count, pad_token_id=self.tokenizer.eos_token_id)
return self.tokenizer.decode(outputs[0])[len(input_text):]
Sample:
private DataType FuntionName(String someId) {
// TODO: Replace with implementation that utilizes someId to obtain information
return DataType.Value;
}
The comment:
- If someId is present in the code, use the getAPI from Client with someId as a parameter to obtain some information.
- If the