generates non-sense response
did you test the result?
for me for the exact code from the model card (including prompt)
it answers with :
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
and for input "how are you?"
it says:
oldsoldssabsarmsarmsarmsarmsarmsarmarmsarmsarmsarmsarmsarmsarmsarmsarmsarmsarmsarmsarmsarmsarmsrachrachsrachrachsrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrachrach rachrachrachrachrachrachrachrachrachrachrachrachrachrachrach rachrachrach
Yes, thanks for raising the issue.
The same problem happened to limcheekin/mpt-7b-storywriter-ct2.
I not sure whether there's something missing in my conversion process or the models are not supported by CTranslate2.
You can try to run the conversion yourself and test it out. Let's me know if you managed to get it working in your PC.
I suggest you try out the following repo (a different but similar model) if you don't want to do the model conversion yourself published by another CTranslate2 supporter:
https://huggingface.co/michaelfeil/ct2fast-RedPajama-INCITE-7B-Chat
I tried many small models <=7b, and WizardLM-7b and fastchat give me the best summaries and answers, but fastchat wins on a couple of things
- license
flan-t5 models are really open-source (apache2), unlike llama models which are not available for commercial usage - context
flan-t5 models (including fastchat) seems not too have a problem with longer contexts than specified,
I used 3K+ tokens (which is more than 2048 specified in the config) - speed
maybe because it is smaller (only 3B), but it is faster than quantized 4-bit wizardlm-7b, especially when testing with longer contexts
so my question :)
are you going to upload a new working version?
PS: in my tests, flan-alpaca-xl is not on par with fastchat, so it is not an option even if they were trained from the same base model
UPDATE:
flan-alpaca-gpt4-xl works as good as fastchat,
thanks for quantizing it!
You're mostly welcome on the quantize version of the flan-alpaca-gpt4-xl. I glad that it help.
Thanks for sharing the testing outcomes of the fastchat-t5 model. It seems worth to take another look into the issue.
By the way, did you tried to run the conversion and quantization on fastchat-t5 yourself and test it out?
Created an issue regarding this matter at https://github.com/OpenNMT/CTranslate2/issues/1295
after latest changes it seems to work, but it is appending a quote (`) to every word.
the response for "hi, how are you" is:
I?` I'm` good!` How` about` you?` How` are` you?
the response for translation to German is:
Die` Haus` ist` wunderbar.
After following the suggestion from that thread you posted, and switched the tokenizer to the one from flan-alpaca-gpt4-xl-ct2 it works fine
Thanks for sharing the solution. I updated the repo to use the tokenizer of the flan-alpaca-gpt4-xl-ct2, no switching required.
Please verify and close the issue if there's no problem.
Thanks.
The new tokenizer partially works but it doesn't recognize newlines (\n), which is expected from the original Flan T5 model. But the fastchat-t5 tokenizer uses a special encoding to represent newlines.
Example:
input_text = "line1\nline2"
Flan T5 tokenizer: ['▁line', '1', '▁line', '2', '</s>']
fastchat-t5-3b tokenizer: ['▁line', '1', '\n', '▁line', '2', '</s>']
More information on how to use the fastchat tokenizer: https://github.com/OpenNMT/CTranslate2/issues/1220#issuecomment-1679749680