Why the inference speed so slow compare with same 7B parameters of Qwen?

#26

by lucasjin - opened Jul 4

Jul 4

It's slower about 30% from my sense when chat on same GPU A100.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment