Attention
LLM Attention Kernels and GPU Primitives
· ☕ 1 min read · âœī¸ k4i
A series index for LLM attention kernels and GPU primitives: fused softmax, online softmax, FlashAttention, PagedAttention kernels, Triton/CUDA, and memory-access optimization.
LLM Attention Kernels and GPU Primitives