PNNL - SciDB Case Study
Pacific Northwest National Laboratory (PNNL) is a Department of Energy (DOE) Office of Science national laboratory where interdisciplinary teams advance science and technology and deliver solutions to America's most intractable problems in energy, national security, and the environment. PNNL employs 4,200 staff, has a $850 million annual budget, and has been managed by Ohio-based Battelle since the Lab's inception in 1965.
PNNL is currently evaluating SciDB's role in both the atmospheric and biological domains, and contributing to the design and implementation of the SciDB to support these domains.
In the atmospheric domain, atmospheric data from the DOE-funded Atmospheric Radiation Measurement (ARM) Program's ground-based sensors and other atmospheric programs' satellites and models is well suited for the SciDB design.
Through ARM, the DOE-funded development of several highly instrumented ground stations, two mobile facilities, and an aerial facility for studying cloud formation processes and their influence on radiative transfer and for measuring other parameters that determine the radiative properties of the atmosphere. This scientific infrastructure and resultant data archive are available for use by scientists worldwide through the ARM Climate Research Facility. This user facility has enormous potential to advance scientific knowledge in a wide range of interdisciplinary earth sciences.
Current ARM data is archived in netCDF daily files, which contain time varying n-D arrays of data and the associated metadata and provenance data. The netCDF file structure will map easily into the SciDB array data model. Along with the ability to store thumbnail images of the scientifically relevant data for quick review, the shared-nothing parallel architecture, and extensibility, SciDB appears to be a good fit for atmospheric data.
In biology, the expansive mass spectrometry capabilities at the Environmental Molecular Sciences Laboratory (EMSL) enable high-throughput, high-resolution analysis of complex mixtures. These resources are applied to a broad range of scientific studies related to human health and the environment, including the climate. Biologists and bioinformaticists rely on cutting-edge proteomics tools to perform complex analysis such as protein quantification and the characterization of protein to protein complexes. To support these types of analyses, experimentalists and downstream users rely on a data-management facility that consists of large-scale experimental relational databases and a petascale archive. EMSL is a DOE national scientific user facility located at PNNL.





