Technology Research and Development #1

A software environment for bio-NMR Data Processing and Analysis: NMRbox

Summary

The broad aim of TRD1 is to utilize existing and emerging virtualization technology to simplify the development, installation, distribution, and maintenance of the complex software environment needed to support bio-NMR studies. This platform is designed to foster the reproducibility of bio-NMR studies by providing persistent access to fully configured computational resources.

NMRbox users and developers

Overview

Virtualization refers to methods for simulating a hardware and/or software environment. Modern processors are designed with virtualization technology built-in, thereby allowing “guest” virtual machines (VMs) to run at near native performance on a “host” computer. The VMs may be downloaded and executed locally or accessed remotely as a cloud-based Platform-as-a-Service (PaaS) through virtual network computing (VNC). Virtualization is a robust technology that delivers multiple benefits: (1) a zero-configuration computing platform, (2) easy access to distributed computing, (3) simplified software development and maintenance on a single target OS, (4) efficient sharing of computational resources, (5) simplified system administration, and (6) long-term persistence of software that utilize deprecated or obsolete OSs.

Innovative Ideas

NMRbox provides a persistent computing platform that both maintains access to previously developed software packages and fosters reproducible research – a hallmark of the scientific process that is often not achieved. The spectroscopistbenefits from a zero-configuration platform provisioned with easy to discover software with zero-cost access to significant computational resources. The software developer is able to focus development and support on a single target OS with the enhanced ability to develop software tools that integrate multiple primary software packages already managed by the Center.

Key Objectives

Develop the NMRbox virtual machine provisioned with software utilized in NMR data processing and analysis. Software included in NMRbox will be determined by community polls and the needs of the DBPs and Collaborators, in consultation with the External Advisory Board (EAB). Software developers will be engaged in both the provisioning process and the Center’s training and dissemination activities.

Develop a distribution platform for the NMRbox VM and its associated resources.This website will allow users and software developers to create accounts, download VMs, manage individual PaaS VMs, discover software, and access training information.

Develop a system for passing tasks that benefit from parallelization from PaaS VMs to distributed computing clusters. Many NMR software applications such as those performing structure calculations and maximum entropy reconstructions of NMR data are embarrassingly simple to parallelize across computer clusters. We will develop tools within NMRbox to allow near seamless integration between the PaaS VMs and in-house compute clusters.