Archive – k4i's blog

2026

posts May 28 From Absolute Positional Encoding to RoPE: Why Position Can Be a Rotation
posts May 27 Estimating Compute and Memory Requirements for LLM Training and Inference
posts May 23 Agent Skill Management: Turning AI Assistants from Clever to Reliable
posts Apr 22 Disaggregated Prefill: Splitting Compute Across Machines
posts Apr 22 Prefix Caching: Reusing KV Cache Across Requests
posts Apr 22 Chunked Prefill: Slicing the Prefill to Protect Decode Latency
posts Apr 22 Continuous Batching: Scheduling at Iteration Granularity
posts Apr 22 Paged Attention: Virtual Memory for the GPU
posts Apr 21 Online Softmax: Tiling for Arbitrarily Large Rows
posts Apr 20 Why KV Cache Works in LLM Inference
posts Apr 20 Fused Softmax in Triton
posts Apr 19 SSH Port Forwarding: Local and Remote Tunnels Explained
posts Mar 22 Mitmproxy + Tampermonkey = better {llm, …} viewer
posts Feb 16 Batch vs Stochastic Gradient Descent
posts Feb 16 Forward & Backward Propagation

2024

2023

posts Dec 19 Cycle Finding Algorithms
posts Jan 21 Connect to your android wirelessly

1
2
3
4
5