minimal example with flash attention
#28
by
keepitsane
- opened
Hello just curious if there was a minimal example demonstrating how to use flash attention. Thanks in advance!
Please refer to the documentation at https://huggingface.co/docs/transformers/main/en/model_doc/mistral#combining-mistral-and-flash-attention-2 , simply add attn_implementation="flash_attention_2"
to AutoModel.from_pretrained(...)
will do the trick. Note that it requires A100 or newer GPUs.
keepitsane
changed discussion status to
closed
FlashAttn is now added via torch.sdpa if you use torch>=2.2 and transformers>=4.37.1
Best Michael Feil