b2862
dc685be4 · CUDA: add FP32 FlashAttention vector kernel (#7188) · May 12, 2024