Seems very good as a draft model for speculative decoding!

by stduhpf - opened Jan 15

Jan 15

Thanks for doing this experiment. The inference speed is blazing fast and the quality of the output is just good enough to have decent accept rate when using it as a speculative decoding drafty model. (aroud 68% when paired with llama2 13B)

stduhpf

Jan 15

•

edited Jan 15

Now If only the vocab matched better with other llama models, that would be even better. (It didn't cause any issues, but maybe it could in some cases)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment