BOF: MapReduce and Data Intensive Applications
Abstract: We are in the era of data deluge and future success in science depends on the ability to leverage and utilize large-scale data. This proposal follows up our successful first meetings in this series of “MapReduce application and environments” at TeraGrid 2011. Further we will use it to kick start an XSEDE forum. It aligns directly with several NSF goals including Cyberinfrastructure Framework for 21st Century Science and Engineering (CF21) and Core Techniques and Technologies for Advancing big Data Science & Engineering (BIGDATA). In particular, MapReduce based programming models and run-time systems such as the open-source Hadoop system have increasingly been adopted by researchers of HPC, Grid and Cloud community with data-intensive problems, in areas including bio-informatics, data mining and analytics, and text processing. While MapReduce run-time systems such as Hadoop are currently not supported across XSEDE systems (it is available on some systems including FutureGrid), there is increased demand for these environments by the science community. This BOF session will provide a forum for discussions with users on challenges and opportunities for the use of MapReduce as an interoperable framework on HPC, Grid and Cloud.