Focus Areas

EP Analytics’ expertise and tools can assist enterprises in maximizing the return-on-investment in HPC systems. We assist clients with Performance Characterization, Energy Efficiency, System Design and Emerging Technology Integration.

Performance Analysiss

HPC Performance Analysis Tools

EP Analytics engineers apply the tools and analysis expertise developed for over a decade to generate detail performance analysis and accurate performance predictions of entire workloads for new and existing HPC systems. By characterizing the application workload on an existing system in terms of the fundamental operations it uses and expressing the ability of a system to perform those operations, EP Analytics can generate a detailed performance analysis of the workload for new or different system architectures. For government agencies or companies contemplating major computing system procurements, the EP Analytics methodology can help reduce risk and maximize investment by ensuring that the best architecture is chosen for a particular application or workload. The EP Analytics methodology has been successfully used by the Department of Defense's High Performance Computing Modernization Program to guide its system selection process during four major supercomputer procurements cycles for its unclassified and classified research HPC systems.

Performance Analysis

Related Papers & Presentations

June, 2015

VecMeter: Measuring Vectorization on the Xeon Phi

Abstract: Wide vector units in Intel’s Xeon Phi accelerator cards can significantly boost application performance when used effectively. However, there is a lack of performance tools that provide programmers accurate information about the level of vectorization in their codes. This paper presents VecMeter, an easy-to-use tool to measure vectorization on the Xeon Phi. VecMeter utilizes binary instrumentation so no source code modifications are necessary. This paper presents design details of VecMeter, demonstrates its accuracy, defines a metric for quantifying vectorization, and provides an example where the tool can guide optimization of some code sections to improve performance by up to 33%.

Joshua Peraza, Ananta Tiwari, William Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: IEEE Cluster, 2015. Available upon request.

May, 2015

Optimizing Codes on the Xeon Phi: A Case-study with LAMMPS

Abstract: Intel’s Xeon Phi co-processor has the potential to provide an impressive 4 GFlops/Watt while promising users that they need only to recompile their code to get it to run on the accelerator. This paper reports our experience on running LAMMPS, a widely-used molecular dynamics code, on the Xeon Phi and the steps we took to optimize its performance on the device. Using performance analysis tools to pinpoint bottlenecks in the code, we were able to achieve a speedup of 2.8x from running the original code on the host processors vs. the optimized code on the Xeon Phi. These optimizations also resulted in an improved LAMMPS’ performance on the host – speeding up the execution by 7x.

Adam Jundt, Ananta Tiwari, William Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: XSEDE, 2015. Available upon request.

January, 2015

Making the Most of SMT in HPC: System and Application Level Perspectives

Abstract: This work presents an end-to-end methodology for quantifying the performance and power benefits of Simultaneous Multithreading (SMT) for HPC centers and applies this methodology to a production system and workload. Ultimately, SMT’s value system-wide depends on whether users effectively employ SMT at the application level. However, predicting SMT’s benefit for HPC applications is challenging; by doubling the number of threads, the application’s characteristics may change. This work proposes statistical modeling techniques to predict the speedup SMT confers to HPC applications. This approach, accurate to within 8%, uses only lightweight, transparent performance monitors collected during a single run of the application.

Leo Porter, Michael Laurenzano, Ananta Tiwari, Adam Jundt, William Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: TACO (ACM Transactions on Architecture and Code Optimization), 2015. Available upon request.

September, 2014

Using Profiling to Detect Performance Problems: Presentation at HPCMP Frontier Project

August, 2013

Viewing Application/Machine Interactions Through Computational Idioms

Abstract: Models of application behavior are one of the keys to bridging the gap between current large-scale system design practices and upcoming exascale system designs. Processor/accelerator specialization and heterogeneity have been proposed as possible paths forward for attaining the significant energy efficiency improvements necessary to achieve exascale-level computing capabilities within an acceptable power envelope. To have an impact on the exascale system design process, the models must be (1) abstract, containing information that is relevant and actionable across a wide range of programming and execution models and (2) complementary to a well-defined and standardized machine characterization methodology.
We argue that a key component of this modeling paradigm is what we term an idiom, a small computational or memory access pattern. We hypothesize that much of the computational work within HPC can be expressed as the combination of a reasonably small number of basis idioms. Understanding application composition and machine characteristics in terms of how they behave in the presence of (combinations of) this small number of idioms allows us to bridge the gap between large workloads and an increasingly diverse and complex landscape of hardware options.

Michael Laurenzano, Laura Carrington, Ananta Tiwari, Joshua Peraza, William Ward, Jr.†, and Roy Campbell†
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Published in: MODSIM (Workshop on Modeling & Simulation of Systems and Applications. Workshop sponsored by the U.S. Department of Energy, Office of Advanced Scientific Computing Research.), 2013. Available upon request.

Want to know more about our services and expertise? Contact Us Today