i load the model for inference using 2H100 gpu. but the model is very slow with Flash Attention.
Can you share your code?
· Sign up or log in to comment