K-nearest Neighbor Rule: A Replica Selection Approach in Grid Environment

Date
2006-02-14
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Grid technology is developed to share data across many organizations in different geographical locations. Data replication is a good technique that helps to move data because it caches data closer to users. The idea of replication is to store copies in different locations so it can be easily recovered if one copy at one location is lost. Moreover, if data can be kept closer to user via replication, data access performance can be improved dramatically. When different sites hold replicas, there are significant benefits realized when selecting the best replica. Network performance plays a major role in selecting a replica. However, current research shows that other factors such as disk I/O also plays an important role in file transfer. In this paper, we describe a new optimization technique that considers both disk throughput and network latencies when selecting the best replica. Previous history of data transfer can help in predicting the best site that can hold replica. The k-nearest neighbor rule is one such predictive technique. In this technique, when a new request arrives for best replica, it looks at all previous data to find a subset of previous file requests that are similar to it and uses them to predict the best site that can hold replica. In this work, we implement and test the k-nearest algorithm for various file access patterns and compare results with the traditional replica catalog based model. The results demonstrate that our model outperforms the traditional model for sequential and unitary random file access requests.
Description
Keywords
Computer Science
Citation