Far, Behrouz H.Esmaeilpour, Arina2016-06-132016-06-1320162016http://hdl.handle.net/11023/3053With an accelerating rate of data generation, sophisticated techniques are essential to meet scalability requirements. One of the promising avenues for handling large datasets is distributed storage and processing. Hadoop is a well-known framework for distributed storage and processing. Further, data summarization is a useful concept for managing large datasets. Data summarization techniques are intended to produce compact yet representative summaries for the entire dataset. Consolidation of these tools can allow a distributed implementation of data summarization. In this thesis, this goal is achieved by proposing and implementing a distributed Gaussian Mixture Model Summarization using the MapReduce framework (MR-SGMM). The main purpose of the proposed method is to summarize a dataset with a density-based clustering algorithm called DBSCAN algorithm, and then summarize each discovered cluster using the SGMM approach in a distributed manner. Testing the implementation with synthetic and real datasets is used to demonstrate its validity and efficiency.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Artificial IntelligenceComputer ScienceDistributed density-based clusteringDistributed cluster summariztionGaussian mixture modelMapReduceDistributed Gaussian Mixture Model Summarization Using the MapReduce Frameworkmaster thesis10.11575/PRISM/25727