A Survey of LLM Quantization: From Linear Quantization to Codebooks
· â 29 min read · âī¸ k4i
A practical survey of LLM quantization, covering linear quantization, codebook quantization, LLM.int8(), SmoothQuant, GPTQ, AWQ, NF4, AQLM, KV cache quantization, and FP8.

