Proactive Data Management System (PDMS)

Abstract
Proactive Data Management System (PDMS) is designed to manage large datasets within grid environments. PDMS is particularly useful in scientific environments where large amounts of data are often moved between computing and data archiving sites. PDMS facilitates management and movement of data using metadata, i.e., the data items are iden- tified using their inherent properties and characteristics rather than the file names in which they are stored. The use of metadata abstracts away the physical location of a file allowing PDMS to transparently manage replicas of a file. It is intended to be used by groups need- ing to manage large data sets across several locations. PDMS utilizes well known Data Grid services. This allows it to interoperate with various workflow managers in use today. Management of data using metadata allows the replication requests to PDMS to be specified in terms of metadata. For example, a replication request to PDMS can be move the data generated by user A for project B within last three months to the site X . The metadata in the above example are - (i) generated by user A, (ii) belonging to project B and (iii) generated in last 3 months. The metadata in the above example correspond to some logical files in which the data is stored. The logical files can be physically present at multiple locations, in which case, PDMS locates all pieces of the dataset and initiates a transfer of all the pieces. Thus, with a given replication request, PDMS needs to perform two key tasks before initiating transfers - (i) use metadata to establish the logical names of the files that match the metadata query and (ii) select sources of replicas for those logical files not already at the destination. This management of replicas on the basis of metadata fills gap in the previously available Data Management services available. PDMS is designed to restrict access to authorized and authenticated users who have permission to use the system. Files are stored in logical groupings referred to as collections. It also restricts users access to specific collections. These access restrictions resemble file ownership with ownership of collections as well as read and write privileges. A more complete description of the access control can be found in [3]. PDMS maintains the consistency of the data for a collection. This currently includes not allowing the same physical file to be registered twice (as two separate logical files). PDMS also ensures that users conform to the schema they include in their registration request. PDMS could be configured to enforce each collection to conform to a specific schema. This is particularly useful in large groups that need to be sure all metadata contains certain information and want to prevent buggy registration processes from introduction inconsistency or incomplete metadata. Consistency requirements for the PDMS system are intended to be configurable as consistency checking can be expensive.
Description
Keywords
Computer Science
Citation