Very Slow Generation on google colab

#1
by delitante-coder - opened

I am loading model in 4 bit, and also using bnd_compute_dtype = torch.bfloat16.
Is anyone else face the same issue ?

Sign up or log in to comment