Writing
What I Learned Profiling PyTorch Memory Leaks Across Two Backends
June 3, 202615 min read
Eight leak patterns on MPS and CUDA — CUDA dampens leak signals but does not hide them, and the bugs that matter show up everywhere.
Diagnosing a PyTorch Memory Leak on Apple MPS
May 15, 20262 min read
How Stormlog helped us find and fix a subtle GPU memory leak that only appeared on Apple Silicon.
I Helped Build a GPU Memory Profiler. Then I Had to Learn What GPU Memory Actually Is.
March 17, 20268 min read
Walking through the Stormlog tutorial after shipping the tool — from PyTorch counters to a deliberate leak, OOM evidence, and the fix.
OOM Flight Recorder Ring-Buffer: Deep Dive
February 14, 202611 min read
Why a ring-buffer flight recorder for GPU OOM fills a gap PyTorch snapshots and NCCL tracing do not — temporal context, automatic dumps, and structured artifacts.