When torch.nn.functional.scaled_dot_product_attention calls _scaled_dot_product_attention_math, the model reports an error
#3
by
Quasimodo0808
- opened
If the sdpa in visual.py::attention_fn_default() uses the math kernel, then its output is contiguous. The output is transposed(), and then view() is executed. https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/visual.py#L78 veiw() will report an error
You can try change output = self.dense(out.view(B, L, -1))
to output = self.dense(out.reshape(B, L, -1))
This comment has been hidden
You can try change
output = self.dense(out.view(B, L, -1))
tooutput = self.dense(out.reshape(B, L, -1))
The reason for using view() is because you are using SDPA's flash-attention?