Data Integration With OGSA-DAI

Researchers in the physical sciences continue to generate increasing amounts of data from simulations, sensors, and cataloging efforts such as DNA mappings and climate records. This increase in data has led to the need for need for new data management tools that assist researchers in managing the volume of data entailed as well as distributing the data across disk volumes and administrative domains. Distributing the data facilitates greater collab- oration between scientists and maximizes the data s value. These new data management requirements have led to the development of numerous data management systems. Two such systems are the Proactive Data Management System (PDMS) [4] and BioSimGrid [3]. BioSimGrid is a Data Grid project designed to distribute bio-molecular simulation results. PDMS is a data management tool developed by the University of Calgary Grid Research Centre (GRC) that facilitates management and movement of data using metadata rather than physical file locations. The proliferation of different data management systems leads to the need for an extensible framework that facilitates the integration of multiple data sources. OGSA-DAI [2] was designed to meet this goal. This document discusses the integration of services provided by PDMS or BioSimGrid with other data facilities such as databases using OGSA-DAI. The rest of this document is structured as follows. Sections 2 and 3 provide an overview of BioSimGrid and PDMS respectively. Section 4 discusses the architecture and limitations of the OGSA-DAI framework. The integration of BioSimGrid and PDMS are discussed in sections 5 and 6. Section 8 summarizes the document.
Computer Science