LSST - SciDB Case Study
The Large Synoptic Survey Telescope (LSST) is a ground-based telescope that will repeatedly photograph the entire available sky using a 3.2 gigapixel camera and an 8.4 meter mirror. Over its projected 10-year lifetime, the survey will generate over 100 petabytes of data, including both 55 petabytes of raw images and multi-petabyte catalogs describing 50 billion astronomical objects, more than a hundred trillion detections, and associated complex metadata. Scalable, fast access to this data is essential to enable exploration, experimentation, and discovery by professional astronomers, students, and the public.
SciDB's features are a good match for the LSST data management requirements. Specifically, its array data model can store the sequence of raw images as a 3-D array and the catalogs as 1-D arrays, using nested arrays to manage the time series of detection properties and image thumbnails. Its shared-nothing parallel architecture appears to be the best way to provide rapid access to all the data, and it uses commodity hardware that can be scaled with the load. SciDB will support provenance tracking, which is required for fault tolerance, to enable the regeneration of unsaved intermediate data products, and to indicate to users the degree of processing and hence implied data quality. Built-in uncertainty modeling simplifies the expression of common astronomical queries. Its fault tolerance and automatic management will reduce the cost of administering LSST's peta-scale data set. It is likely to enable new ways of exploring and analyzing the data that have historically been too impractical to consider.
While LSST's current baseline design uses more traditional components, the project anticipates that SciDB will be ready for benchmarking in time for the final technology selection prior to construction. To facilitate this, the LSST database team is continuing to actively assist SciDB development, ensuring its suitability for real problems in astronomy and other sciences.





