Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler
This beginner-friendly guide walks through using torch.profiler to analyze a matrix multiplication + addition operation, revealing CPU-GPU coordination patterns and how torch.compile fuses operations to reduce kernel launch overhead.
入选理由:使用 `torch.profiler.profile` + `record_function` 可轻松捕获 CPU/GPU 事件与内核调用链,生成可交互 trace 文件
