UNITY - A DATABASE INTEGRATION TOOL

Date
2000-10-16
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The World-Wide Web (WWW) provides users with the ability to access a vast number of data sources distributed across the planet. Internet protocols such as TCP/IP and HTTP have provided the mechanisms for exchanging the data. However, a fundamental problem with distributed data access is the determination of semantically equivalent data. Ideally, users should be able to extract data from multiple Internet sites and have it automatically combined and presented to them in a usable form. No system has been able to accomplish these goals due to limitations in expressing and capturing data semantics. This paper details the construction, function, and deployment of Unity, a database integration software package which allows database semantics to be captured so that they may be automatically integrated. Unity is the tool that we use to implement our integration architecture detailed in previous work. Our integration architecture focuses on capturing the semantics of data stored in databases with the goal of integrating data sources within a company, across a network, and even on the World-Wide Web. Our approach to capturing data semantics revolves around the definition of a standardized dictionary which provides terms for referencing and categorizing data. These standardized terms are then stored in semantic specifications called X-Specs which store metadata and semantic descriptions of the data. Using these semantic specifications, it becomes possible to integrate diverse data sources even though they were not originally designed to work together. The centralized version of the architecture is presented which allows for the independent integration of data source information (represented using X-Specs) into a unified view of the data. The architecture preserves full autonomy of the underlying databases which are transparently accessed by the user from a central portal. Distributing the architecture would by-pass the central portal and allow integration of web data sources to be performed by a user's browser. Such a system which achieves automatic integration of data sources would have a major impact on how the Web is used and delivered. Unity is the bridge between concept and implementation. Unity is a complete software package which allows for the construction and modification of standardized dictionaries, parsing of database schema and metadata to construct X-Specs, and contains an implementation of the integration algorithm to combine X-Specs into an integrated view. Further, Unity provides a mechanism for building queries on the integrated view and algorithms for mapping semantic queries on the integrated view to structural (SQL) queries on the underlying data sources. Notes: Join released technical report. Released as TR-00-17 for the University of Manitoba, and 2000-664-16 for the University of Calgary.
Description
Keywords
Computer Science
Citation