K-nearest Neighbor Rule: A Replica Selection Approach in Grid Environment
Date
2006-02-14
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Grid technology is developed to share data across many organizations
in different geographical locations. Data replication is a good technique
that helps to move data because it caches data closer to users. The idea of
replication is to store copies in different locations so it can be easily
recovered if one copy at one location is lost. Moreover, if data can be kept
closer to user via replication, data access performance can be improved
dramatically. When different sites hold replicas, there are significant
benefits realized when selecting the best replica. Network performance plays a
major role in selecting a replica. However, current research shows that other
factors such as disk I/O also plays an important role in file transfer. In
this paper, we describe a new optimization technique that considers both disk
throughput and network latencies when selecting the best replica. Previous
history of data transfer can help in predicting the best site that can hold
replica. The k-nearest neighbor rule is one such predictive technique. In this
technique, when a new request arrives for best replica, it looks at all
previous data to find a subset of previous file requests that are similar to
it and uses them to predict the best site that can hold replica. In this
work, we implement and test the k-nearest algorithm for various file access
patterns and compare results with the traditional replica catalog based
model. The results demonstrate that our model outperforms the traditional
model for sequential and unitary random file access requests.
Description
Keywords
Computer Science