the model is not optimize in term of inference

by Imran1 - opened 20 days ago

Imran1

20 days ago

i load the model for inference using 2H100 gpu. but the model is very slow with Flash Attention.

20 days ago

Can you share your code?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment