What is cuda-kernel-refine?
Iterative CUDA kernel optimization using NVIDIA profiling tools (nsys, ncu). Use when optimizing kernels, improving throughput, reducing bandwidth, analyzing roofline, comparing benchmarks, or investigating register pressure and occupancy. Source: trevors/dot-claude.