This execution can also be used for baseline analysis on AMD MI50/ MI60 systems. The Traveling Wave Electron Acceleration (TWEAC) science case used in this run is a representative science case for PIConGPU. This analysis was performed using a grid size of 240 x 272 x 224, and 10 time steps with the Mid-November Figure of Merit (FOM) run setup. In this report, we measure single GPU metrics for the three kernels, offer high level takeaways from the conducted analysis, and compare the profiling data from NSight Compute to that of NVProf. The Current xi Deposition kernel and Particle Push kernel both set up the particle attributes for running any physics simulation with PIConGPU, so it is crucial to improve the performance of these two kernels. Three kernels, Current Deposition (also known as Compute Current), Particle Push (Move and Mark), and Shift Particles are known to be some of the most time-consuming kernels in PIConGPU. The primary goal of this report is to focus on the evaluation of PIConGPU’s most time-intensive kernels using NVProf and NSight Suite. Additionally, the engineers wanted to take a closer look at the newest NVIDIA profiling tools which allows us to identify the most useful features on these tools and will provide an opportunity to compare it to new AMD and Cray’s performance analysis tool releases and provide feedback to our vendor partners on what features are most important and mission critical for CAAR efforts. Any bottlenecks that are observed via performance profiling on Summit are likely to also impact scalability on the Frontier-dev system and the Frontier Early Access (EA) system.
#CATE LAWRENCE READWRITE CODE#
To this effect, performance engineers on the PIConGPU team wanted to dive deep into the application to understand at the finest granularity, which portions of the code could be further optimized to exploit the hardware on Summit at it’s maximum potential and also to elucidate which key kernels should be tracked and optimized for the CAAR effort to port this code to Frontier. PIConGPU has been selected as one of the the eight applications for OLCF’s coveted Center for Accelerated Application Readiness (CAAR) program aimed at the facility’s Frontier supercomputer (OLCF’s first exascale system to launch in 2021), to partner with our vendors (primary vendors: AMD and Cray/HPE) ensuring that Frontier will be able to perform large-scale science when it opens to users in 2022.
#CATE LAWRENCE READWRITE FULL#
PIConGPU is a highly optimized application that runs production jobs at scale on a system Oak more » Ridge Leadership Facility’s (OLCF) Summit supercomputer (using the full machine at 4600 nodes at 98% of GPU utilization on all ~28000 NVIDIA Volta GPUs). While PIConGPU has been optimized for at least 5 years to run well on NVIDIA GPU-based clusters, there has been limited exploration by the development team of potential scalability bottlenecks using recently updated and new tools including NVIDIA’s NVProf tool and the brand-new NVIDIA NSight Suite (Systems and Compute) tools. PIConGPU, Particle In Cell on GPUs, is an open source simulations framework for plasma and laser-plasma physics used to develop advanced particle accelerators for radiation therapy of cancer, high energy physics and photon science. Our experiments show average weighted relative errors of ~19% and ~23% for five CORAL-2 (a collaboration between multiple US Department of Energy, DOE, labs to procure Exascale systems) and 12 Rodinia benchmarks respectively, without running the applications on the target future = , We present AHEAD, a profiling and modeling tool to quantify the impact of intra-node communication mechanism (e.g., PCI or NVLink) on application performance. To inform procurement decisions, supercomputing centers need the tools to quickly model the impact of changes of the node architectures on application performance. Accelerator-based architectures, however, add additional complexity due to node heterogeneity. 1 machine, the 200 petaflop Summit system at OLCF, is a GPU-based machine. Starting with the Titan supercomputer (at the Oak Ridge Leadership Computing Facility, OLCF) in 2012, top supercomputers have Increasingly leveraged the performance of GPUs to support large-scale computational science.