Focus Areas

EP Analytics’ expertise and tools can assist enterprises in maximizing the return-on-investment in HPC systems. We assist clients with Performance Characterization, Energy Efficiency, System Design and Emerging Technology Integration.

Xeon Phi

Xeon Phi Tools

Development of the General Purpose Graphics Processing Unit (GPGPU) some years ago offered the potential for achieving teraflop-plus computational performance at the workstation level and indeed, GPGPUs have been successfully applied in fields such as computational chemistry and molecular dynamics. However, GPGPU adoption has been severely limited by an arcane programming model and a co-processor-based hardware architecture resulting in significant limitations imposed by peripheral busses and isolated, non-uniform memory layouts. Intel’s Xeon Phi architecture, in both its current co-processor-based and forthcoming host socket-based implementations, addresses many of the limitations of GPGPUs while offering high performance computation via wide vectorization and massive multithreading. The Xeon Phi facilitates application development and porting by providing standard language support (C, C++, etc.), multiple programming models (e.g., native, offload) and extensive compiler support. However, to effectively employ the Xeon Phi for maximum performance, refactoring of application code to extract parallelism via vectorization and multithreading is typically required. Application analysis tools that can identify key patterns in computational operations and memory accesses are critical for assessing potential performance improvements and developing application refactoring strategies for the Xeon Phi.

Over a decade-long period, EP Analytics and its principals have been conducting research in the performance modeling and analysis of HPC systems and developing tools for static and dynamic analysis and simulation of such systems based on the x86, Power, and ARM architectures. Recently, the company conducted a study of the Xeon Phi (Knight’s Corner) architecture and recognized the commercial significance of enabling HPC-based modeling and simulation for advanced manufacturing. We are working to further develop and commercialize our tool suite for use with the Xeon Phi architecture, with the aim of facilitating the porting of HPC modeling and simulation tools to workstation- and small-cluster-based systems. However, the utility of the tools will not be limited to standalone or small systems and will be equally useful for large-scale HPC systems or cloud-based HPC systems.

Xeon Phi

Related Papers & Presentations

June, 2015

VecMeter: Measuring Vectorization on the Xeon Phi

Abstract: Wide vector units in Intel’s Xeon Phi accelerator cards can significantly boost application performance when used effectively. However, there is a lack of performance tools that provide programmers accurate information about the level of vectorization in their codes. This paper presents VecMeter, an easy-to-use tool to measure vectorization on the Xeon Phi. VecMeter utilizes binary instrumentation so no source code modifications are necessary. This paper presents design details of VecMeter, demonstrates its accuracy, defines a metric for quantifying vectorization, and provides an example where the tool can guide optimization of some code sections to improve performance by up to 33%.

Joshua Peraza, Ananta Tiwari, William Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: IEEE Cluster, 2015. Available upon request.

May, 2015

Optimizing Codes on the Xeon Phi: A Case-study with LAMMPS

Abstract: Intel’s Xeon Phi co-processor has the potential to provide an impressive 4 GFlops/Watt while promising users that they need only to recompile their code to get it to run on the accelerator. This paper reports our experience on running LAMMPS, a widely-used molecular dynamics code, on the Xeon Phi and the steps we took to optimize its performance on the device. Using performance analysis tools to pinpoint bottlenecks in the code, we were able to achieve a speedup of 2.8x from running the original code on the host processors vs. the optimized code on the Xeon Phi. These optimizations also resulted in an improved LAMMPS’ performance on the host – speeding up the execution by 7x.

Adam Jundt, Ananta Tiwari, William Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: XSEDE, 2015. Available upon request.

September, 2014

Using Profiling to Detect Performance Problems: Presentation at HPCMP Frontier Project

September, 2014

A Look at Heterogeneous Architectures in HPC: Presentation at NRL

September, 2013

Understanding the Performance of Stencil Computations on Intel's Xeon Phi

Abstract: Accelerators are becoming prevalent in high performance computing as a way of achieving increased computational capacity within a smaller power budget. Effectively utilizing the raw compute capacity made available by these systems, however, remains a challenge because it can require a substantial investment of programmer time to port and optimize code to effectively use novel accelerator hardware. In this paper we present a methodology for isolating and modeling the performance of common performance-critical patterns of code (so-called idioms) and other relevant behavioral characteristics from large scale HPC applications which are likely to perform favorably on Intel Xeon Phi. The benefits of the methodology are twofold: (1) it directs programmer efforts toward the regions of code most likely to benefit from porting to the Xeon Phi and (2) provides speedup estimates for porting those regions of code. We then apply the methodology to the stencil idiom, showing performance improvements of up to a factor of 4.7x on stencil-based benchmark codes.

Joshua Peraza, Ananta Tiwari, Michael Laurenzano, Laura Carrington, William Ward, Jr.†, and Roy Campbell†
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Published in: IEEE International Conference on Cluster Computing (CLUSTER), 2013. Available at IEEE

Want to know more about our services and expertise? Contact Us Today