The use of concept hierarchies in privacy preserving data acquisition for data mining

Date
2012
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis presents a concept hierarchy-based approach to pnvacy preservmg data collection for data mining called the p-level model. The p-level model allows data providers to divulge information at any chosen privacy level (p-level), on any attribute. Data collected at a high p-level signifies divulgence at a higher conceptual level and thus ensures more privacy. Data providers have greater control of their privacy preferences, and have provided significantly (25-75%) more personal data values, at various p-levels, than when providing the same information using the regular, fixed-level Cf-level) method of data collection. However, the data mining process, which involves the integration of various data values, can constitute a privacy breach if combinations of attributes at the various p-levels result in the inference of knowledge that exists at lower p-levels. Providing anonymity guarantees prior to release can further protect the collected data set from privacy breaches due to linking the released data set with external data sets. This thesis describes the p­level reduction phenomenon and proposes methods to identify and control the occurrence of this privacy breach. One objective of this thesis is to explore the feasibility of applying data collected with the p-level approach to data mining problems. We apply data collected using the p-level approach to a data classification problem, and discover that the mining accuracy of the p­level approach classifier is comparable to that of the f-level (no privacy) approach, thus we conclude that the p-level approach is beneficial for the purpose of privacy preserving data collection.
Description
Bibliography: p. 156-168
Includes copy of ethics approval. Original copy with original Partial Copyright Licence.
Keywords
Citation
Williams, A. A. (2012). The use of concept hierarchies in privacy preserving data acquisition for data mining (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/4735
Collections