Much slower than StarCoder?
#4
by
jiang719
- opened
Based on my experience, WizardCoder takes much longer time (at least two times longer) to decode the same sequence than StarCoder.
I thought their is no architecture changes.
Is their any? Otherwise, what's the possible reason for much slower inference?
I think the possible reason is that WizardCoder tends to generate a much longer response than StarCoder.
Same here, and WizardCoder uses more VRAM.
It could be due to use_cache being disabled. Is there any specific reason to disable it?