Loading…

Sign up or log in to see what your friends are attending and create your own schedule!


View analytic
 

11:30am

Software: The Prickly Pear
    Tuesday July 17, 2012 11:30am - 12:00pm @ Toledo 5th Floor

    Software: The Prickly Pear Archive: A Portable Hypermedia for Scholarly Publication

    Abstract: An executable paper is an hypermedia for publishing, review- ing, and reading scholarly papers which include a complete HPC software development or scientific code. A hyperme- dia is an integrated interface to multimedia including text, figures, video, and executables, on a subject of interest. Re- sults within the executable paper, including numeric output, graphs, charts, tables, equations and the underlying codes which generated such results. These results are dynamically regenerated and included in the paper upon recompilation and re-execution of the code. This enables a scientifically enriched environment which functions not only as a journal but a laboratory in itself, in which readers and reviewers may interact with and validate the results. 
    The Prickly Pear Archive (PPA) is such a system[1]. One distinguishing feature of the PPA is the inclusion of an un- derlying component-based simulation framework, Cactus[3], which simplifies the process of composing, compiling, and executing simulation codes. Code creation is simplified us- ing common bits of infrastructure; each paper augments to the functionality of the framework. Other distinguishing features include the portability and reproducibility of the archive, which allow researchers to re-create the software environment in which the simulation code was created. 
    A PPA production system hosted on HPC resources (e.g. an XSEDE machine) unifies the computational scientific pro- cess with the publication process. A researcher may use the production archive to test simulations; and upon arriving at a scientifically meaningful result, the user may then incor- porate the result in an executable paper on the very same resource the simulation was conducted. Housed within a vir- tual machine, the PPA allows multiple accounts within the same production archive, enabling users across campuses to bridge their efforts in developing scientific codes. 
    The executable paper incorporates into its markup code references to manipulable parameters, including symbolic equations, which the simulation executable accepts as input. An interface to this markup code enables authors, readers, and reviewers to control these parameters and re-generate the paper, potentially arriving at a novel result. Thus, the executable paper functions not only as a publication in it- self, but also as an interactive laboratory from which novel science may be extracted. One can imagine the executable paper environment, encapsulated and safeguarded by a vir- tual machine, as a portable laboratory in which the com- putational scientist arrived at the result. The notion of an executable paper is particularly useful in the context of com- puter and computational science, where the code underlying a (scientific) software development is of interest to the wider development community. 
    Why are executable papers to be preferred over traditional papers? As Gavish et al. have observed[2], the current work- flow in the life cycle of a traditional paper may be summa- rized in the following five steps: 
    1. Store a private copy of the original data to be processed on the local machine. 2. Write a script or computer program, with hard-coded tuning parameters, to load the data from the local file, analyze it, and out- put selected results (a few graphical figures, tables, etc) to the screen or to local files. 3. Withhold the source code that was executed, and the copy of the original data that was used, and keep them in the local file system in a directory called e.g. “code-final.” 4. Copy and paste the results into a publica- tion manuscript containing a textual descrip- tion of the computational process that pre- sumably took place. 5. Submit the manuscript for publication as a package containing the word processor source file and the graphical figure files. 
    Three issues with this workflow lie with the communication, validation, and reproducibility of the method used to ar- rive at the result. The novel execution-based publication paradigm for computer and computational science has a few advantages over the traditional paper-based paradigm which allow it to overcome these issues. 
    The first advantage is the enhanced explanatory power of- fered by multimedia such as graphs, charts, and numeric values which are generated from manipulable parameters. The reader and reviewer may view the parameters to see how such figures were arrived at. They may also vary the parameters, re-execute the code, and witness changes to me- dia themselves. This level of interactivity offered by the executable archive allows the audience to understand the simulation by allowing them to conduct it first-hand. 
    The second advantage is that increased level of validation and peer review of the experimental method–owing to to the inclusion of code and logs used to arrive at a result–which is essential for computational science to self-correct. Though a computational scientist may reproduce a simulation code according to a natural-language or pseudocode description, implementation differences may have a dramatic impact on the result. (minutia such as the order of nested loops, the order of computations, and the language constructs used). Were they subject to the review, inefficiencies and errors could be more easily corrected. Fortunately, since the ex- perimental apparatus in computational science is digitized, it is potentially easier for it to achieve this level of validation relative to paper-based media. 
    The third is reproduciblity, which supports efforts to mod- ify or extend the computational method. In other sciences, the experimental method is fleshed out in the traditional paper in such detail that anyone with the proper equipment may repeat it and expect the same results. Perhaps the computational scientist may be able to supply pseudocode in the traditional paper for others to implement. However, reproducing codes (especially simulation codes) from pseu- docode is a laborious and time-consuming process. Even when such reproduction is completed, gaining access to the proper equipment is often as problematic for the computa- tional scientist as for any other, as it requires allocations to high-security supercomputers and time invested in building the software dependencies necessary to run the code. 
    Given the digital nature of computational science, it seems that communicating, validating, and reproducing experimen- tal work should be easier. To facilitate the process of extend- ing or further investigating a computational model, ideally modifiable code would come with the paper, along with the process of building and executing it (in the form of a script) and the total environment in which the code was written, built, and executed. In addition, the paper results would be dynamically generated upon recompilation and execution following any modifications to the code. 
    The latter scenario may seem like a distant possibility, but the production of many executable archives is already under- way. One such archive is the Prickly Pear Archive, an exe- cutable paper journal which uses the Cactus Computational Toolkit as its underlying framework. The award-winning Cactus framework is used for general problem-solving on regular meshes, and is used across disciplines including nu- merical relativity, astrophysics, and coastal science. In ad- Figure 1: The PPA workflow integrates several com- ponents through its PHP interface, including Cac- tus, Kranc, SimFactory and LaTeX. 
    In addition to Cactus, the PPA interfaces with several other com- ponents, including: SimFactory, set of utilities for building and running Cactus applications and managing Cactus jobs; Kranc, a script for creating Cactus thorns from equations rendered in a Mathematica-style language; the Piraha pars- ing expression grammar for parsing parameter and configu- ration files; the LaTeX markup language; and an encapsu- lating virtual machine which enables portability and rep

    ...

    Speakers

    Type Software and Software Environments Track
 

Get Adobe Flash player