Spaces:
Running
Error!
It think it just takes a while during busy times to get a GPU out of the cluster. You just need to wait a bit.
can we not use bitsandbytes with ZeroGPU?
In addition, I would like to know how to use Flash attention with ZeroGPU!!
- You can use bitsandbytes, I've used it myself.
- I'm not sure, haven't needed it.
Well, you still exceeded your quota. The quota is fixed for any kind of usage, because you still use a costly GPU when debugging.
For first problem:
Use accelerate in requirements.txt, use @spaces.GPU(queue=False)
and use default theme and UI (Yes UI causes this issue, I found it today).
For second one:
use this@spaces.GPU(queue=False, time=30sec)
Choose a time that meets your needs. However, if a query exceeds this duration, task will be terminated.
Yeah, that's normal for the ZeroGPU Runtime. It should still work, atleast it worked for me.
I solved the issue of installing flash attention
# flash attention
import subprocess
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)
I hope to install causal-conv1d
and mamba-ssm
libraries too :)