Back to Articles Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Test: Run 01_matmul_add.py with size 4096 on GPU and analyse profiler output to identify GPU idle time.

What to do now

Test: Run 01_matmul_add.py with size 4096 on GPU and analyse profiler output to identify GPU idle time.

Summary

Profiling in PyTorch (Part 1) walks developers through using torch.profiler to uncover bottlenecks in a simple matrix multiplication and bias addition routine. The example script `01_matmul_add.py` runs on an NVIDIA A100‑SXM4‑80GB GPU and demonstrates how a 64×64 matrix is heavily CPU‑bound, with less than 1 % of time spent on the GPU kernel. By increasing the matrix size to 4096×4096, the profiler shows a shift to compute‑bound execution, with GPU time rising to 4.5 ms and the kernel `ampere_bf16_s16816gemm` dominating the trace. The profiler exports two artifacts: a statistical table via `prof.key_averages().table` and a Chrome‑trace JSON via `prof.export_chrome_trace`, which can be visualised in Perfetto. Developers can annotate scopes with `torch.profiler.record_function` to label events, and control profiling steps using `torch.profiler.schedule(wait=1, warmup=1, active=3)`. The guide emphasizes that larger workloads reduce CPU‑to‑GPU launch overhead and that profiling should be repeated multiple times to warm up the GPU. By analysing the trace, developers can identify idle periods, kernel launch latency, and opportunities to batch operations for better GPU utilisation.

Key changes

torch.profiler.profile with activities CPU and CUDA
record_function annotates scopes
Export table via prof.key_averages().table
Export trace via prof.export_chrome_trace
64×64 matrix is CPU‑bound, <1% GPU time
4096×4096 matrix shifts to compute‑bound, GPU time 4.5 ms
Perfetto UI visualises trace
schedule(wait=1,warmup=1,active=3) controls profiling steps

Affects

internal

Story evolution

Customer impact

Analyzing matches…

Ask about this story

Impact on an agency? Which customers? Compare historically Risks of waiting