Seems very good as a draft model for speculative decoding!
#1
by
stduhpf
- opened
Thanks for doing this experiment. The inference speed is blazing fast and the quality of the output is just good enough to have decent accept rate when using it as a speculative decoding drafty model. (aroud 68% when paired with llama2 13B)
Now If only the vocab matched better with other llama models, that would be even better. (It didn't cause any issues, but maybe it could in some cases)