Caching doesn't work on multi gpu
#23
by
eastwind
- opened
I get gibberish if caching is enabled when inferencing over multigpu
@eastwind
, so you do not get gibberish every time?
Would you kindly post some non-gibberish examples?
What did you do to go from Gibberish to English?
@eastwind
I now found your contribution here to answer the last question. Thanks!
https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/20
Yeah, not using cache hurts performance alot.
We recommend using Text Generation Inference for fast inference with Falcon. See this blog for more information.