Gradient-Descent

Optimizers: From SGD To AdamW

📅 Jun 29, 2026 · ☕ 10 min read · ✍️ k4i

A mechanism-first guide to optimizers: what SGD, momentum, RMSProp, Adam, and AdamW each solve, why AdamW became a strong default for modern deep learning, and when other optimizers still matter.

Loss Functions: What a Model Is Really Optimizing

📅 Jun 23, 2026 · ☕ 9 min read · ✍️ k4i

A practical guide to loss functions: when to use MSE, MAE, Huber, binary cross entropy, cross entropy, KL divergence, hinge loss, contrastive loss, and triplet loss.

Loss Functions: What a Model Is Really Optimizing

Activation Functions: The Small Nonlinearity That Shapes a Network

📅 Jun 18, 2026 · ☕ 8 min read · ✍️ k4i

A mechanism-first guide to activation functions: why neural networks need nonlinearities, how sigmoid, tanh, ReLU, GELU, and SiLU differ, and why a 400-function survey is best read as a map rather than a menu.