Flash-Attention-Implementation Implementation of Flash-Attention (both forward and backward) with PyTorch, LibTorch, CUDA, and Triton Geting Started PyTorch cd flashattn/pytorch python flashattn.py LibTorch cd flashattn/libtorch python test.py CUDA TODO Triton TODO