model aya-expanse-8b inference is very slow

#16

by blueqq1 - opened 6 days ago

6 days ago

I attempted to run the model with HF, HF+FA2, and vLLM configurations, but the speed remained consistent across each. The input used was: "The future of AI is", with parameters set to temperature=0.3 and max_token=512. The processing time was approximately 10 seconds on both GPU A40 and A100.

blueqq1

6 days ago

and dtype is bfloat16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment