Data Privacy in High-Dimensional and Big Data Age

Zakerzadeh, Hessam

Data Privacy in High-Dimensional and Big Data Age

atmire.migration.oldid	3897
dc.contributor.advisor	Barker, Kenneth
dc.contributor.author	Zakerzadeh, Hessam
dc.contributor.committeemember	Savavi Naeni, Reyhaneh
dc.contributor.committeemember	Elhajj, Reda
dc.contributor.committeemember	Hagen, Gregory
dc.contributor.committeemember	Matwin, Stan
dc.date.accessioned	2015-12-07T21:10:08Z
dc.date.available	2015-12-07T21:10:08Z
dc.date.issued	2015-12-07
dc.date.submitted	2015	en
dc.description.abstract	The prevalent need for publicly available data set along with the privacy-breach-related incidents occurring when such data is released, increases the need to develop resilient and precise techniques of privacy-preserving data publishing. To this end, numerous privacy models and algorithms have been developed for different data types. However, advances in privacy algorithms still suffer from two fundamental problems: data dimensionality and cardinality growth. The data dimensionality has remained a challenge for a wide variety of algorithms in data mining, clustering, classification and privacy. In the privacy domain, simply applying the existing privacy algorithms results in unacceptable information loss. Similar to the dimensionality problem, cardinality growth is an open problem in the privacy realm. In fact, privacy algorithms are not implementable in an acceptable time over tera-byte scale data sets. This thesis shows that some of the common properties of real data can be leveraged to ameliorate the negative effects of the curse of dimensionality in practice. In real data sets, many dimensions contain high levels of inter-attribute correlations. Such correlations enable the use of a process known as vertical fragmentation to create vertical subsets of smaller dimensionality. This allows the use of an anonymization process, which is based on combining results from multiple independent fragments. This dissertation presents a vertical fragmentation which is general enough to be applied to the k-anonymity and l-diversity models. In addition, this dissertation presents a new approach to privacy-preserving data mining of very massive data sets using MapReduce. Two of the most widely-used privacy models k-anonymity and l-diversity for anonymization are studied. We also investigate the privacy issue in publishing graph data commonly seen as big data sets (i.e. social networks). Graph data is generally more difficult to anonymize because the structural information “hidden” in the graph can be leveraged by an attacker to infer sensitive information. In big graph data publishing, we only focus on protecting attributes as they typically carry sensitive information.	en_US
dc.identifier.citation	Zakerzadeh, H. (2015). Data Privacy in High-Dimensional and Big Data Age (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25520	en_US
dc.identifier.doi	http://dx.doi.org/10.11575/PRISM/25520
dc.identifier.uri	http://hdl.handle.net/11023/2665
dc.language.iso	eng
dc.publisher.faculty	Graduate Studies
dc.publisher.institution	University of Calgary	en
dc.publisher.place	Calgary	en
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subject	Computer Science
dc.subject.classification	Data Privacy	en_US
dc.subject.classification	High-dimensional data	en_US
dc.subject.classification	Big Data	en_US
dc.subject.classification	k-anonymity	en_US
dc.subject.classification	l-diversity	en_US
dc.title	Data Privacy in High-Dimensional and Big Data Age
dc.type	doctoral thesis
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Calgary
thesis.degree.name	Doctor of Philosophy (PhD)
ucalgary.item.requestcopy	true

Collections

Open Theses and Dissertations

Data Privacy in High-Dimensional and Big Data Age

Files

Collections

Libraries & Cultural Resources