Sign up or log in to see what your friends are attending and create your own schedule!

View analytic


Tech: The Data Supercell
    Thursday July 19, 2012 9:45am - 10:15am @ Camelot 3rd Floor

    Tech: The Data Supercell

    Abstract: In April of 2012, the Pittsburgh Supercomputing Center unveiled a unique mass storage platform named 'The Data Supercell'. The Data Supercell (DSC) is based on the SLASH2 filesystem, also developed at PSC, and incorporates multiple classes of systems into its environment for the purposes of aiding scientific users and storage administrators. 

    The Data Supercell aims to play a major role in the XSEDE storage ecosystem. Besides serving the vanilla role of PSC's mass storage system, DSC features the new novelties of the SLASH2 filesystem. Outfitted with these new features DSC seeks to provide a new class of integrated storage services to serve users of large scientific data and XSEDE resource providers. 

    This submission will cover all or most aspects of the DSC. These will include:  

    * Software and Hardware Architecture 

    Here will be explained the types of storage systems which compose the DSC with emphasis on the heterogeneous nature of the assembly. Particularly, the SLASH2 I/O service is very portable across system class and operating system. We will detail how this feature was instrumental in constructing the DSC by enabling the inclusion of a legacy tape system with dense storage bricks running ZFS. 

    * Performance Analysis of DSC 

    The community will be interested in the performance of any new distributed / parallel filesystem. Since DSC is the flagship SLASH2 deployment at this time, we shell disseminate I/O performance measurements for data and metadata operations through this submission.  

    * Novel features such as File Replication and Poly-residencies 

    At this time, DSC is primarily used as PSC's mass storage system however the system has interesting capabilities which extend beyond a the features of a traditional archiver. Of these, is the ability to move data in parallel between a scratch filesystem (ie Lustre or GPFS) and the highly dense storage nodes. Further, such features can be enacted by normal users of the system allowing them to transfer data between mass storage and (any) parallel filesystem with exceptional performance. 


    SLASH2's file replication capabilities allow for users and administrators to determine the layout, residency, and number of replicas on a per-block basis or for a whole file. Our paper will illuminate such capabilities as used on the DSC. 

    * Upcoming Integrated scientific cloud data services 

    Here we shall describe the how existing and upcoming SLASH2 features will be used to aid XSEDE's large data users. This will focus on users and / or research groups with considerable on-campus storage and compute resources which frequently operate on XSEDE resources. The section will describe in detail the vision of incorporating data replication, user-specific eventually consistent metadata volumes, data multi-residency, and system managed parallel replication for creating tightly integrated storage environments between large scale campus and XSEDE RP resources.



    Professor of Physics and Director, Pittsburgh Supercomputing...

    Type Technology Track
    Session Titles File Systems
    Tags File Systems

Get Adobe Flash player