Open Letter to the SciDB Community
May 19, 2010
SciDB continues to gain momentum, advancing both community organization and product development. This letter shares our progress and our plans.
On the organizational front, we have just moved the SciDB website to our own hosting service. Thanks very much to the University of Wisconsin for hosting SciDB from its inception. We added a new page for news and upcoming events, cleaned up and focused our mailing lists, posted information and guidelines for new volunteers, and will be activating new channels for increasing communication. We are also in the process of commissioning a dedicated cluster for testing and demonstrations. Our community is indeed international: stretching from a very active group of Russian scientists and computer-scientists to a group of volunteers from Persistent Systems in India who supplied a testing framework and are working on loaders. We also have a number of senior academics and grad students across the US who are contributing significantly to the design and the code base as well as running performance benchmarks to assess competing architectural design choices.
On the product development front, we have advanced well beyond the prototypes used for the 2009 XLDB and VLDB demos. Starting afresh, the design and development contributors have created the foundational architecture and core implementation that we will build on in the coming years. We are developing parallel, distributed implementations for common linear algebra and statistical operations that are used in a wide range of applications. These implementations will work efficiently on extremely large and extremely sparse datasets. A preliminary version, R0.5, with user-level documentation, reasonable array operations, and initial statistical operations will be available at the end of June for pioneering users to check out. A community discussion forum as well as the project management and bug-tracking system will be open and available concurrent with R0.5. A more fully-featured and robust R1.0 is targeted for the end of September for early scientific adopters. All SciDB documentation and source code will always be open and freely available from the Download page on the SciDB website.
On the applications front, we welcome new use cases and new test datasets. We are currently working on applications in genomics, in astronomy, in high-energy physics, and for financial markets. In April we did an initial internal demonstration on genomic sequencing data. The SciDB data model with its native support for uncertainty, for application-specific handling of ‘nulls’, and for queries that utilize ‘neighborhood’ data, will be ideally suited for the ‘short read’ data coming off sequencing machines. We will be doing significant work in this domain going forward.
We look forward to a very productive summer, growing both the community and the code base as well as receiving much constructive feedback.
Mike Stonebraker
Jacek Becla
Marilyn Matz






