Science: Comparing the performance of group detection algorithm in serial and parallel processing environments
Abstract: Developing an algorithm for group identification from a collection of individuals without grouping data has been getting significant attention because of the need for increased understanding of groups and teams in online environments. This study used space, time, task, and players’ virtual behavioral indicators from a game database to develop an algorithm to detect groups over time. The group detection algorithm was primarily developed for a serial processing environment and later then modified to allow for parallel processing on Gordon. For a collection of data representing 192 days of game play (approximately 140 gigabytes of log data), the computation required 270 minutes for the major step of the analysis when running on a single processor. The same computation required 22 minutes when running on Gordon with 16 processors. The provision of massive compute nodes and the rich shared memory environment on Gordon has improved the performance of our analysis by a factor of 12. Besides demonstrating the possibility to save time and effort, this study also highlights some lessons learned for transforming a serial detection algorithm to parallel environments.