Science: Multiple Concurrent Queries on Demand: Large Scale Video Analysis in a Flash Memory Environment as a Case for Humanities Supercomputing
Abstract: [Please see the attached pdf for this abstract with figures integrated, as well as additional author names.]
The Large Scale Video Analytics (LSVA) research project is a newly supported effort that explores the viability of a human-machine hybrid approach to managing massive video archives for the purposes of research and scholarship. Video databases are characterized by incomplete metadata and wildly divergent content tags; and while machine reading has a low efficacy rate, human tagging is generally too labor intensive to be viable. Thus, in the LSVA project, a prototype of the approach that integrates multiple algorithms for image recognition, scene-completion, and crowd-sourced image tagging will be developed such that the system grows smarter and more valuable with increased usage. Building on interdisciplinary research in the humanities and social sciences on one hand (film theory, collective intelligence, visual communication), and computer science on the other (signal processing, large feature extraction for machine-reading, algorithmic pattern recognition), the LSVA project will enable the researchers in testing different algorithms by placing them into a workflow, applying them to the same video dataset in real-time, and finally analyzing the results using cinematic theories of representation.
Currently, the process of understanding and utilizing the content of a large database of video archives is time consuming and laborious. Besides the large size of the archives, other key challenges to effectively analyzing the video archives are limited metadata and lack of precise understanding of the actual content of the archive.
For many years, scholars have required high-performance computing resources for analyzing and examining digital videos. However, due to usage-policies and technical limitations, supercomputers required scholars to work in a batch-oriented workflow. The batch-oriented workflow is contradictory to the typical workflow of scholars which is exploratory and iterative in nature such that the results of one query are used to inform the next query. Batch-oriented workflow interrupts this process and can hinder rather than help discovery.
The arrival of the XSEDE resource “Gordon”, the supercomputer that has extensive flash memory, transformed the relationship between this research method and HPC. Its architecture opened the possibility for researchers to interactively, and on-demand, query large databases in real-time, including databases of digital videos. Additionally, the computational capability of Gordon is sufficient for extensive analysis of video-assets in real-time for determining which videos to return in response to a query. This is a computationally intensive process involving queries that cannot be anticipated ahead of time.
This project will be using the Gordon supercomputer to not only pre-process videos to automatically extract meaningful metadata, but also as an interactive engine that allows the researchers to generate queries on the fly for which metadata that was extracted a priori is not sufficient. In order to be useful to researchers, we are combining an interactive database, a robust web-based front-end (Medici), and powerful visualization representations to aid the researcher in understanding the contents of the video-footage without requiring them to watch every frame of every movie. Given that there is more video than one could ever view in a lifetime on YouTube alone, with more added to it and other video hosting sites on a daily-basis, the need for and implications of this type of meta-level analysis is great indeed.
Due to the need for high-quality end-user experience (low-latency and high-throughput), the LSVA project has dedicated and interactive access to Gordon’s I/O nodes. In the first phase of this project, the database and video archive will be resident on Gordon’s I/O node and Luster File System. In the future, we will experiment with federated databases located at different sites across the country.
This work builds on the NCSA Medici system as the front-end that the user interacts with (see Fig. 2). Medici comes well-equipped to allow automated processes to be dropped into a technology-supported workflow. Medici also provides easy tagging and grouping of data elements using an RDF model at the back-end.
Conclusions
Though we are in the preliminary stages of this project, we are enthusiastic and confident about building an on-demand interactive query engine for video archives and designing a user-interface with appropriate visualizations to support real-time video analysis and querying. Ultimately, we hope to turn this system into a science gateway that can be used by the community of film scholars, social scientists, computer scientists, and artists.