Lib.rs

›

candle-flash-attn-v1

Flash attention V1 layer for the candle ML framework

1 unstable release

new 0.0.1	Apr 15, 2025

#62 in #flash

63 downloads per month

MIT/Apache

2MB
849 lines

Candle Flash Attention v1 Layer

Flash Attention v2 does not support Turing GPUs (T4, RTX 2080). This layer can be used in replacement of the official flash attention Candle layer in the meantime.

Dependencies

~16–23MB
~404K SLoC