-
Dr. Rakesh Agrawal, IBM Almaden Lab., USA
Rakesh Agrawal leads the Quest project at the IBM Almaden Research Center,
which pioneered key data mining concepts and technologies. IBM's
commercial data mining product, Intelligent Miner, grew out of this work.
His research has been incorporated into other IBM products, including DB2
Mining Extender, DB2 OLAP Server and WebSphere Commerce Server. His
technical contributions have also influenced several external commercial
and academic products, prototypes and applications. Rakesh has published
more than 100 research papers and he has been granted 45 patents. He is
the recepient of the ACM-SIGKDD First Innovations Award as well as the
ACM-SIGMOD 2000 Innovation Award. He is an IBM Fellow and also a Fellow
of IEEE.
Keynote: Privacy Aware Data Management and Analytics
Abstract:
The explosive progress in networking, storage, and processor
technologies is resulting in an unprecedented amount of digitization
of information. In concert with this dramatic increase in digital
data, concerns about the privacy of personal information have emerged
globally. The concerns over massive collection of data are naturally
extending to analytic tools applied to data. Data mining, with its
promise to efficiently discover valuable, non-obvious information from
large databases, is particularly vulnerable to misuse.
Inspired by the privacy tenet of the Hippocratic Oath, we argue that
future database systems must include responsibility for the privacy of
data they manage as a founding tenet. We enunciate the key principles
for such Hippocratic database systems, distilled from the principles
behind current privacy legislations and guidelines. We identify the
technical challenges and problems in designing Hippocratic databases,
and also outline some solution approaches.
One way of preserving privacy of individual data records would be to
perturb them. Since the primary task in data mining is the development
of models about aggregated data, we explore if we can develop accurate models
without access to precise information in individual data records. We
consider the concrete case of building a decision-tree classifier from
perturbed data. While it is not possible to accurately estimate
original values in individual data records, we describe a
reconstruction procedure to accurately estimate the distribution of
original data values. By using these reconstructed distributions, we
are able to build classifiers whose accuracy is comparable to the
accuracy of classifiers built with the original data.
We will conclude by pointing out some open research problems.
-
Dr. Jaideep Srivastava, Univ. of Minnesota, USA
Dr. Jaideep Srivastava is a professor on the faculty of the University of Minnesota. Between 1999 and 2001 he took a two-year leave, during which he spent time at Amazon.com and at Yodlee Inc. This wide-ranging industry experience has provided him with a unique perspective on the application of various computer science technologies in various kinds of Web-based services. As a researcher, educator, consultant, and invited speaker in the areas of data mining, databases, artificial intelligence, and multimedia for over 15 years, Dr. Srivastava continues his active collaboration with the technology industry, both for research and technology transfer. An often-invited participant in technical and technology strategy forums, Dr. Srivastava has presented at a multitude of industry, academic and government meetings. He has been involved in the organization of a number of conferences, and serves on the editorial board of various journals. The federal government has solicited his opinion on computer science research as an expert witness. He also served in an advisory role to the governments of India and Chile on various software technologies. Dr. Srivastava received his B.Tech. in Computer Science from the Indian Institute of Technology - Kanpur, and M.S. and Ph.D. in Computer Science from the University of California - Berkeley.
Keynote: Web Mining - Accomplishments & Future Directions
Abstract:
From its very beginning, the potential of extracting valuable knowledge from the Web has been quite evident. Web mining - i.e. the application of data mining techniques to extract knowledge from Web content, structure, and usage - is the collection of technologies to fulfill this potential. Interest in Web mining has grown rapidly in its short existence, both in the research and practitioner communities. A number of new concepts, e.g. PageRank, hubs & authorities, web communities, web interestingness measures, etc., and techniques to compute them have been developed. In addition, a wide variety of commercial enterprises regularly use Web mining in their daily operations, e.g. Amazon, Yahoo, Google, etc. This talk provides an overview of the accomplishments of the field - both in terms of technologies and applications - and outlines key future research directions.
|