Caching doesn't work on multi gpu

#23

by eastwind - opened Jun 2, 2023

Discussion

eastwind

Jun 2, 2023

I get gibberish if caching is enabled when inferencing over multigpu

captain-fim

Jun 4, 2023

@eastwind , so you do not get gibberish every time?
Would you kindly post some non-gibberish examples?
What did you do to go from Gibberish to English?

captain-fim

Jun 4, 2023

@eastwind I now found your contribution here to answer the last question. Thanks!
https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/20

eastwind

Jun 4, 2023

Yeah, not using cache hurts performance alot.

FalconLLM

Technology Innovation Institute org Jun 9, 2023

We recommend using Text Generation Inference for fast inference with Falcon. See this blog for more information.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment