Estimating Compute and Memory Requirements for LLM Training and Inferenceđ May 27, 2026 · đ May 28, 2026 · â 14 min read · âī¸ k4iA back-of-the-envelope framework for estimating LLM training FLOPs, inference FLOPs, weight memory, KV cache, and training memory.
Why KV Cache Works in LLM Inferenceđ Apr 20, 2026 · đ May 30, 2026 · â 8 min read · âī¸ k4iwhy the key-value cache avoids redundant computation in autoregressive decoding, and the memory/compute tradeoffs it introduces.