PerfPal is a full-featured performance framework that consists of a suite of easy-to-use HPC performance profiling and modeling tools. Unlike other performance analysis frameworks, PerfPal eschews the need for recompiling the application code to enable collection of performance data. PerfPal utilizes binary instrumentation-based techniques to directly modify the production binaries to collect execution traces. The instrumented binaries can then be run on production computing environments at scale; raw traces collected using this method are then analyzed to derive performance reports and viewgraphs. The design of the performance reports and recommendations is driven by important usability lessons that the principals of EP Analytics have learned in their extensive experience with directly engaging with the HPC developers in many performance engineering collaborations. Some key components/capabilities of PerfPal are the following:
1. PerfPal’s Perfector component is a lightweight profiler that, for each rank/task, breaks the application execution time into computation, MPI communication and I/O times.
2. PerfPal’s VecMeter component gives precise information on the level of vector unit utilization at loop and function levels.
3. PerfPal's "Hot-Path" component automatically inserts timers around all functions to help identify key bottlenecks in large HPC codes.
Contact us for more information.