Not able to have an output with a smaller size than the given `max_length`
#3
by
Loheek
- opened
Hello, when using the 3B model, I am not able to have an output with a smaller size than the given max_length
.
It always give a correct answer on first tokens, and then output garbage to fill until max_length is reached.
I use the code in the generate_openelm.py
as template.
I tried to adapt the different generation options as described here, like length penalty or changing the generation strategy, but without success.
Is it possible ? Any advice would be very welcome
It seems that apple does not release its chat template, so currently it only worked in text-generation (by using llama2's tokenizer)