Data Privacy in High-Dimensional and Big Data Age

atmire.migration.oldid3897
dc.contributor.advisorBarker, Kenneth
dc.contributor.authorZakerzadeh, Hessam
dc.contributor.committeememberSavavi Naeni, Reyhaneh
dc.contributor.committeememberElhajj, Reda
dc.contributor.committeememberHagen, Gregory
dc.contributor.committeememberMatwin, Stan
dc.date.accessioned2015-12-07T21:10:08Z
dc.date.available2015-12-07T21:10:08Z
dc.date.issued2015-12-07
dc.date.submitted2015en
dc.description.abstractThe prevalent need for publicly available data set along with the privacy-breach-related incidents occurring when such data is released, increases the need to develop resilient and precise techniques of privacy-preserving data publishing. To this end, numerous privacy models and algorithms have been developed for different data types. However, advances in privacy algorithms still suffer from two fundamental problems: data dimensionality and cardinality growth. The data dimensionality has remained a challenge for a wide variety of algorithms in data mining, clustering, classification and privacy. In the privacy domain, simply applying the existing privacy algorithms results in unacceptable information loss. Similar to the dimensionality problem, cardinality growth is an open problem in the privacy realm. In fact, privacy algorithms are not implementable in an acceptable time over tera-byte scale data sets. This thesis shows that some of the common properties of real data can be leveraged to ameliorate the negative effects of the curse of dimensionality in practice. In real data sets, many dimensions contain high levels of inter-attribute correlations. Such correlations enable the use of a process known as vertical fragmentation to create vertical subsets of smaller dimensionality. This allows the use of an anonymization process, which is based on combining results from multiple independent fragments. This dissertation presents a vertical fragmentation which is general enough to be applied to the k-anonymity and l-diversity models. In addition, this dissertation presents a new approach to privacy-preserving data mining of very massive data sets using MapReduce. Two of the most widely-used privacy models k-anonymity and l-diversity for anonymization are studied. We also investigate the privacy issue in publishing graph data commonly seen as big data sets (i.e. social networks). Graph data is generally more difficult to anonymize because the structural information “hidden” in the graph can be leveraged by an attacker to infer sensitive information. In big graph data publishing, we only focus on protecting attributes as they typically carry sensitive information.en_US
dc.identifier.citationZakerzadeh, H. (2015). Data Privacy in High-Dimensional and Big Data Age (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25520en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/25520
dc.identifier.urihttp://hdl.handle.net/11023/2665
dc.language.isoeng
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgaryen
dc.publisher.placeCalgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectComputer Science
dc.subject.classificationData Privacyen_US
dc.subject.classificationHigh-dimensional dataen_US
dc.subject.classificationBig Dataen_US
dc.subject.classificationk-anonymityen_US
dc.subject.classificationl-diversityen_US
dc.titleData Privacy in High-Dimensional and Big Data Age
dc.typedoctoral thesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameDoctor of Philosophy (PhD)
ucalgary.item.requestcopytrue
Files