Archive – k4i's blog

2026

posts Jul 14 Entropy, Cross Entropy, And KL Divergence: A Coding-Cost View
posts Jul 7 Why KL Divergence Is Not A Distance: Direction Changes The Question
posts Jul 2 Common Probability Distributions: Variance And Standard Deviation
posts Jun 29 Optimizers: From SGD To AdamW
posts Jun 23 vLLM Scheduler: How Request Queues Become SchedulerOutput
posts Jun 23 vLLM ModelRunner: How SchedulerOutput Becomes a GPU Forward
posts Jun 23 Numeric Types in Neural Networks: FP32, BF16, FP8, INT8, and INT4
posts Jun 23 Loss Functions: What a Model Is Really Optimizing
posts Jun 18 LLM Inference Sampling: What Temperature, Top-p, and Top-k Actually Control
posts Jun 18 Three Routes For Embodied Models: VLA, World Models, And WAM
posts Jun 18 Activation Functions: The Small Nonlinearity That Shapes a Network
posts Jun 17 Streaming Design: Why The Application Layer Still Matters
posts Jun 7 vLLM Request Lifecycle: From OpenAI API to One Forward Pass
posts Jun 5 Prefill vs Decode: Why One Model Has Two Very Different Bottlenecks
posts Jun 5 LLM Attention Kernels and GPU Primitives
posts Jun 5 LLM Quantization and Low-Precision Serving
posts Jun 5 LLM Inference Lab Reports: Experiments and Benchmarks for Serving Systems
posts Jun 4 vLLM / SGLang Source Reading: From Request to Forward Pass
posts Jun 4 LLM Inference Internals: Core Mechanisms for Serving Engines
posts Jun 1 A Survey of LLM Quantization: From Linear Quantization to Codebooks

1
2
3
4
5