Towards theoretical and practical evaluation of privacy and utility of data sanitization mechanisms

Journal Title
Journal ISSN
Volume Title
Massive data collection, aggregation and analysis about individuals on the Internet raises the fundamental issue of privacy protection. Releasing of collected data is often beneficial for research, testing, marketing, decision making and data mining. However, published data can violate individual's privacy, especially when aggregated with other sources of data. In response to privacy concerns and to ensure privacy of the individuals in the published dataset, data are sanitized by applying specific operations on data prior to publishing them. The cost of performing the privacy operations on the original collected data to achieve privacy is the loss of some information. Hence, data utility is another important factor that should be considered in data sanitization mechanisms. In this thesis, we focus primarily on privacy and utility issues of sanitization mechanisms. There are several sanitization mechanisms with different notions of privacy and utility. To be able to measure, set and compare the level of privacy protection and utility of these mechanisms, there is a need to translate these different mechanisms to a unified frame­work for evaluation. In this thesis, a thorough theoretical and empirical investigation for evaluation of privacy and utility of sanitization mechanisms in non-interactive data r release is proposed by developing two fameworks. Furthermore, we use the specifications of several sanitization mechanisms, to evaluate our frameworks. We first propose a novel framework that represents a mechanism as a noisy channel and evaluate its privacy and utility using information theoretic measures. We show that the deterministic publishing property that is used in most of these mechanisms reduces privacy guarantees and causes information to leak. We also show that by using this framework we can compute the sanitization mechanism's utility from the point of view of a data user. By formalizing the adversary and data user's background knowledge, we demonstrate their great effects on these metrics. We use k-anonymity, a popular sanitization mechanism, as an example and use the framework to analyze the privacy and utility offered by the mechanism. We then provide a mining framework that can be specialized to specific scenarios -modeling privacy and usefulness notions and quantifying their levels for the given dataset. This framework uses a definition of utility of mining tasks that data providers can use to measure and compare the utility of data mining results obtained from the original and sanitized datasets. This will provide a decision support mechanism for data providers to select appropriate sanitization mechanisms. This utility definition is general and captures the information obtained by any data user. The power of the framework is in its adaptability to capture various notions of privacy, utility and adversarial power for comparing sanitization systems in a particular setting.
Bibliography: p. 139-151
Askari, M. (2012). Towards theoretical and practical evaluation of privacy and utility of data sanitization mechanisms (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from doi:10.11575/PRISM/5037