LLM Quantization and Low-Precision Serving
LLM Quantization and Low-Precision Serving
· ☕ 1 min read · âœī¸ k4i
A series index for LLM quantization and low-precision serving: INT8/INT4, GPTQ, AWQ, SmoothQuant, NF4, AQLM, KV cache quantization, FP8 serving, and quality/speed/memory tradeoffs.
LLM Quantization and Low-Precision Serving