Keynote Speakers

The explosive progress in networking, storage, and processor technologies is resulting in an unprecedented amount of digitization of information. In concert with this dramatic increase in digital data, concerns about the privacy of personal information have emerged globally. The concerns over massive collection of data are naturally extending to analytic tools applied to data. Data mining, with its promise to efficiently discover valuable, non-obvious information from large databases, is particularly vulnerable to misuse.

Inspired by the privacy tenet of the Hippocratic Oath, we argue that future database systems must include responsibility for the privacy of data they manage as a founding tenet. We enunciate the key principles for such Hippocratic database systems, distilled from the principles behind current privacy legislations and guidelines. We identify the technical challenges and problems in designing Hippocratic databases, and also outline some solution approaches.

One way of preserving privacy of individual data records would be to perturb them. Since the primary task in data mining is the development of models about aggregated data, we explore if we can develop accurate models without access to precise information in individual data records. We consider the concrete case of building a decision-tree classifier from perturbed data. While it is not possible to accurately estimate original values in individual data records, we describe a reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.

We will conclude by pointing out some open research problems.