A series index for LLM attention kernels and GPU primitives: fused softmax, online softmax, FlashAttention, PagedAttention kernels, Triton/CUDA, and memory-access optimization.
A step-by-step explanation of positional encoding in Transformers, from absolute embeddings to sinusoidal encodings, Euler's formula, and rotary position embeddings.