Sign up or log in to see what your friends are attending and create your own schedule!

View analytic


Tech: Running Many Molecular
    Wednesday July 18, 2012 2:15pm - 2:45pm @ Camelot 3rd Floor

    Tech: Running Many Molecular Dynamics Simulations on Many Supercomputers

    Abstract: The challenges facing biomolecular simulations are manyfold. In addition to long time simulations of a single large system, an important challenge is the ability to run a large number of identical copies (ensembles) of the same system. Ensemble-based simulations are important for effective sampling and due to the low-level of coupling between them, ensemble-based simulations are good candidates to utilize distributed cyberinfrastructure. The problem for the practitioner is thus effectively marshaling thousands if not millions of high-performance simulations on distributed cyberinfrastructure. Here we assess the ability of an interoperable and extensible pilot- job tool (BigJob), to support high-throughput simulations of high- performance molecular dynamics simulations across distributed supercomputing infrastructure. Using a nucleosome positioning problem as an exemplar, we demonstrate how we have addressed this challenge on the TeraGrid/XSEDE. Specifically, we compute 336 independent trajectories of 20 ns each. Each trajectory is further divided into twenty 1 ns long simulation tasks. A single task requires ≈ 42 MB of input, 9 hours of compute time on 32 cores, and generates 3.8 GB of data. In total we have 6,720 tasks (6.7 μs ) and approximately 25 TB to manage. There is natural task-level concurrency, as these 6,720 can be executed with 336-way task concurrency. Using NAMD 2.7, this project requires approximately 2 million hours of CPU time and could be completed in just over 1 month on a dedicated supercomputer containing 3,000 cores. In practice even such a modest supercomputer is a shared resource and our experience suggests that a simple scheme to automatically batch queue the tasks, might require several years to complete the project. In order to reduce the total time-to-completion, we need to scale-up, out and across various resources. Our approach is to aggregate many ensemble members into pilot-jobs, distribute pilot-jobs over multiple compute resources concurrently, and dynamically assign tasks across the available resources.



    Type Technology Track
    Session Titles Software and Middleware

Get Adobe Flash player