Utility of Knowledge Discovered from Sanitized Data

Sramka, Michal; Safavi-Naini, Reihaneh; Denzinger,Jorg; Askari, Mina; Gao, Jie

Utility of Knowledge Discovered from Sanitized Data

Files

19712.pdf(380.89 KB)

Date

2008-09-30

Authors

Sramka, Michal

Safavi-Naini, Reihaneh

Denzinger,Jorg

Askari, Mina

Gao, Jie

Abstract

While much attention has been paid to data sanitization methods with the aim of protecting users’ privacy, far less emphasis has been put to the usefulness of the sanitized data from the view point of knowledge discovery systems. We consider this question and ask whether sanitized data can be used to obtain knowledge that is not defined at the time of the sanitization. We propose a utility function for knowledge discovery algorithms, which quantifies the value of the knowledge from a perspective of users of the knowledge. We then use this utility function to evaluate the usefulness of the extracted knowledge when knowledge building is performed over the original data, and compare it to the case when knowledge building is performed over the sanitized data. Our experiments use an existing cooperative learning model of knowledge discovery and medical data, anonymized and perturbed using two widely known sanitization techniques, called E-differential privacy and k-anonymity. Our experimental results show that although the utility of sanitized data can be drastically reduced and in some cases completely lost, there are cases where the utility can be preserved. This confirms our strategy to look at triples consisting of a utility function, a sanitization mechanism, and a knowledge discovery algorithm that are useful in practice. We categorize a few instances of such triples based on usefulness obtained from experiments over a single database of medical records. We discuss our results and show directions for future work.

Keywords

utility of knowledge, privacy-preserving data mining, differential privacy, k-anonymity, cooperative learning

URI

http://hdl.handle.net/1880/49026

Collections

Science Research & Publications

Full item page