Slow generation
#18
by
tomer
- opened
Hi, I'm trying to generate with Qwen-7B and I think I may be missing something. The model is a lot slower than Llama-2-7b even though I'm using the recommended packages in the Qwen modeling code – I installed the latest stable flash_attn version, and also installed the flash_attn RMS norm implementation from source. Do you know what could be wrong?