Decode

Prefill vs Decode: Why One Model Has Two Very Different Bottlenecks

📅 Jun 5, 2026 · ☕ 8 min read · ✍️ k4i

Why LLM inference splits into a compute-bound prefill phase and a memory-bandwidth-bound decode phase, and how that explains TTFT, TPOT, batching, KV cache pressure, and serving-engine design.