Tech: Invited Talk: Improving XSEDE Software Quality using Software Engineering Best Practice
Abstract: XSEDE is introducing a range of system and software engineering practices to achieve systematic and continuous improvement in the quality of its integrated and supported software. This paper will describe XSEDE’s software engineering practices and what we hope to obtain from them. We discuss the technical and cultural challenges of establishing community-defined practices, and the techniques we have been using to address these challenges. We will introduce the initial engineering practices implemented in project year 1, outline additional engineering practice improvements planned during project year 2, and suggest how these engineering practices could be leveraged by the broader XSEDE community.
Tech: A Framework for Federated Two-Factor Authentication Enabling Cost-Effective Secure Access to Distributed Cyberinfrastructure
Abstract: As cyber attacks become increasingly sophisticated, the security measures used to mitigate the risks must also increase in sophistication.One time password (OTP) systems provide strong authentication because security credentials are not reusable, thus thwarting credential replay attacks. The credential changes regularly, making brute-forceattacks significantly more difficult. In high performance computing,end users may require access to resources housed at several differentservice provider locations. The ability to share a strong token betweenmultiple computing resources reduces cost and complexity. The National Science Foundation (NSF) Extreme Science and EngineeringDiscovery Environment (XSEDE) provides access to digital resources,including supercomputers, data resources, and software tools. XSEDE willoffer centralized strong authentication for services amongst serviceproviders that leverage their own user databases and security profiles.This work implements a scalable framework built on standards to providefederated secure access to distributed cyberinfrastructure.
Tech: Running Many Molecular Dynamics Simulations on Many Supercomputers
Abstract: The challenges facing biomolecular simulations are manyfold. In addition to long time simulations of a single large system, an important challenge is the ability to run a large number of identical copies (ensembles) of the same system. Ensemble-based simulations are important for effective sampling and due to the low-level of coupling between them, ensemble-based simulations are good candidates to utilize distributed cyberinfrastructure. The problem for the practitioner is thus effectively marshaling thousands if not millions of high-performance simulations on distributed cyberinfrastructure. Here we assess the ability of an interoperable and extensible pilot- job tool (BigJob), to support high-throughput simulations of high- performance molecular dynamics simulations across distributed supercomputing infrastructure. Using a nucleosome positioning problem as an exemplar, we demonstrate how we have addressed this challenge on the TeraGrid/XSEDE. Specifically, we compute 336 independent trajectories of 20 ns each. Each trajectory is further divided into twenty 1 ns long simulation tasks. A single task requires ≈ 42 MB of input, 9 hours of compute time on 32 cores, and generates 3.8 GB of data. In total we have 6,720 tasks (6.7 μs ) and approximately 25 TB to manage. There is natural task-level concurrency, as these 6,720 can be executed with 336-way task concurrency. Using NAMD 2.7, this project requires approximately 2 million hours of CPU time and could be completed in just over 1 month on a dedicated supercomputer containing 3,000 cores. In practice even such a modest supercomputer is a shared resource and our experience suggests that a simple scheme to automatically batch queue the tasks, might require several years to complete the project. In order to reduce the total time-to-completion, we need to scale-up, out and across various resources. Our approach is to aggregate many ensemble members into pilot-jobs, distribute pilot-jobs over multiple compute resources concurrently, and dynamically assign tasks across the available resources.
Tech: A Systematic Process for Efficient Execution on Intel's Heterogeneous Computation Nodes
Abstract: Heterogeneous architectures (mainstream CPUs with accelerators / co-processors) are expected to become more prevalent in high performance computing clusters. This paper deals specifically with attaining efficient execution on nodes which combine Intel's multicore Sandy Bridge chips with MIC manycore chips. The architecture and software stack for Intel's heterogeneous computation nodes attempt to make migration from the now common multicore chips to the manycore chips straightforward. However, specific execution characteristics are favored by these manycore chips such as making use of the wider vector instructions, minimal inter-thread conflicts, etc. Additionally manycore chips have lower clock speed and no unified last-level cache. As a result, and as we demonstrate in this paper, it will commonly be the case that not all parts of an application will execute more efficiently on the manycore chip than on the multicore chip. This paper presents a process, based on measurements of execution on Westmere-based multicore chips, which can accurately predict which code segments will execute efficiently on the manycore chips and illustrates and evaluates its application to three substantial full programs -- HOMME, MOIL and MILC. The effectiveness of the process is validated by verifying scalability of the specific functions and loops that were recommended for MIC execution on a Knights Ferry computation node.