Measurement, modeling, and analysis of the file hosting ecosystem

Date
2012
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Web has recently witnessed the emergence of file hosting services. These services provide users with a Web interface to upload, manage, and share files in the cloud. We present a comprehensive, longitudinal characterization study of the file hosting ecosystem. We perform detailed multi-level analysis of the usage behavior, infrastructure properties, content characteristics, and user-perceived performance of several top file hosting services. We instrument a measurement infrastructure that captures the characteristics of the ecosystem from multiple viewpoints across multiple layers. Our study utilizes multiple datasets collected over extended periods of time from passive measurements at an edge network, active measurement of an index site, as well as data collected through third-party Web analytics sources. Our two primary datasets are HTTP transaction and connection summaries of all Internet traffic collected at a large campus edge network over a one-year period. We carefully devised methods to identify user clickstreams in the HTTP transaction summary trace, including the identification of free and premium user instances, as well as the identification of content that is split into multiple pieces and downloaded using multiple transactions. We utilized the connection summary trace to understand and model salient flow-level and host-level properties of file hosting traffic. We augment our analysis with measurements from third-party analytics sources of global file hosting dynamics, as well as crawling file hosting links on an index site. Throughout this characterization, we compare and contrast these services with each other as well as with peer-to-peer file sharing and other media sharing services. To the best of our knowledge, this is the largest characterization study of the file hosting ecosystem. Our results have implications on caching, network management, content placement, and data center provisioning, and are likely to be relevant for both researchers and network administrators.
Description
Bibliography: p. 176-192
Keywords
Citation
Mahanti, A. (2012). Measurement, modeling, and analysis of the file hosting ecosystem (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/5012
Collections