Distributed Gaussian Mixture Model Summarization Using the MapReduce Framework

atmire.migration.oldid4489
dc.contributor.advisorFar, Behrouz H.
dc.contributor.authorEsmaeilpour, Arina
dc.date.accessioned2016-06-13T21:32:48Z
dc.date.available2016-06-13T21:32:48Z
dc.date.issued2016
dc.date.submitted2016en
dc.description.abstractWith an accelerating rate of data generation, sophisticated techniques are essential to meet scalability requirements. One of the promising avenues for handling large datasets is distributed storage and processing. Hadoop is a well-known framework for distributed storage and processing. Further, data summarization is a useful concept for managing large datasets. Data summarization techniques are intended to produce compact yet representative summaries for the entire dataset. Consolidation of these tools can allow a distributed implementation of data summarization. In this thesis, this goal is achieved by proposing and implementing a distributed Gaussian Mixture Model Summarization using the MapReduce framework (MR-SGMM). The main purpose of the proposed method is to summarize a dataset with a density-based clustering algorithm called DBSCAN algorithm, and then summarize each discovered cluster using the SGMM approach in a distributed manner. Testing the implementation with synthetic and real datasets is used to demonstrate its validity and efficiency.en_US
dc.identifier.citationEsmaeilpour, A. (2016). Distributed Gaussian Mixture Model Summarization Using the MapReduce Framework (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25727en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/25727
dc.identifier.urihttp://hdl.handle.net/11023/3053
dc.language.isoeng
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgaryen
dc.publisher.placeCalgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectArtificial Intelligence
dc.subjectComputer Science
dc.subject.classificationDistributed density-based clusteringen_US
dc.subject.classificationDistributed cluster summariztionen_US
dc.subject.classificationGaussian mixture modelen_US
dc.subject.classificationMapReduceen_US
dc.titleDistributed Gaussian Mixture Model Summarization Using the MapReduce Framework
dc.typemaster thesis
thesis.degree.disciplineElectrical and Computer Engineering
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.item.requestcopytrue
Files