Prefill vs Decode: Why One Model Has Two Very Different Bottlenecks
· â 8 min read · âī¸ k4i
Why LLM inference splits into a compute-bound prefill phase and a memory-bandwidth-bound decode phase, and how that explains TTFT, TPOT, batching, KV cache pressure, and serving-engine design.