PAKDD2002

Tutorial:

Topic	Speaker(s)
Storage and Retrieval of XML Data using Relational Databases	Kyuseok Shim and Surajit Chadhuri
Data Analytics for Customer Relationship Management	Jaideep Srivastava
Data Clustering Analysis, from Simple Groupings to Scalable Clustering with Constraints	Osmar Zaiane and Andrew Foss

Tutorial 1:
"Storage and Retrieval of XML Data using Relational Databases"

Tutorial Abstract

The Extensible Markup Language (XML) is becoming the dominant standard for exchanging data over World Wide Web. Due to its flexibility, XML is rapidly emerging as the de facto standard for exchanging information for the next generation web applications.

XML documents can be stored and queried by using specialized semistructure repositories. Such an approach does not allow us to be able to use the features of the state-of-the-art relational database technology. Not only that, large volumes of enterprise data available today exist only in relational database systems. Therefore, efficient storage and retrieval of native XML data using existing
relational data seamlessly is becoming important. In fact, all major commercial relational vendors are supporting such capabilities. However, they face the following three challenges: (1) how to represent XML data in relational model, (2) how to support XML query's processing over XML data stored in relational databases, and (3) how to publish existing relational data to XML format. We will discuss current state of the art technologies for these challenges and present the future research issues.

Outline
1. XML and Related Technologies
(1) DTD, XML Schema
(2) XPath, XSL, XSLT, XQuery
2. Examples of XML Support in Commercial Systems
(1) Oracle
(2) IBM DB2
(3) Microsoft SQL Server
3 Research on Publication of relational data into XML document
(1) SilkRoute framework
(2) XPERANTO framework
4. Research on Storage and Retrieval of XML native data
(1) Various mappings
(2) XML query transformation into SQL
(3) Query processing strategies
5. Research on Structural Summary and Indexing for XML data
(1) Strong Dataguide, T-index, Representative Objects
(2) XTRACT
(3) Access Support Relation, ToX
(4) Extensions of inverted index, Index Fabri

About the Speakers

Dr. Kyuseok Shim is an Assistant Professor at Seoul National University in Korea. Previously, he was an Assistant Professor at KAIST in Korea and a Member of Technical Staff at Bell Laboratories. He was one of the key contributors to the Serendip data mining project in Bell Laboratories. Before that, he worked for Quest Data Mining project at IBM Almaden Research Center and contributed to IBM Intelligent Miner for Data. He received B.S. degree in Electrical Engineering from Seoul National University in 1986, and the MS and Ph.D. degrees in Computer Science from University of Maryland, College Park in 1988 and 1993, respectively. Kyuseok has been working in the area of data mining and databases. He has published several research papers in prestigious conferences and journals. He has served as a program committee member on ACM SIGKDD, ACM SIGMOD, ICDE, PAKDD, and VLDB conferences. He is also currently on the Editorial Board of VLDB and KAIS journals.

Dr. Surajit Chaudhuri is a senior researcher and manager of the Data Management, Exploration and Mining Group at Microsoft Research. He has worked extensively in the area of self-tuning database technology, query processing, data warehousing and data mining on SQL systems. He has published many papers in leading database conferences and journals. His work on self-tuning database technology and data mining has been incorporated in the Microsoft SQL Server product. Surajit has been a
member of program committees of leading database and data mining conferences. In 1999, he was the co-chair of the ACM International Conference on Knowledge Discovery and Data Mining and co-chair of the industrial track of the ACM SIGMOD conference. Surajit did his Ph.D. from Stanford University and B.Tech from Indian Institute of Technology(Kharagpur, India). Prior to joining Microsoft Research, he was a member of the research staff at Hewlett-Packard Laboratories (Palo Alto) from 1992-1995

Tutorial 2:
"Data Analytics for Customer Relationship Management"

Tutorial Abstract

Corporations across the world are recognizing that intimate, one-to-one relationships with their customers are critical for survival in the increasingly global and competitive marketplace. The ones which are proactive and quick footed, have taken the initiative to implement a Customer Relationship Management (CRM) system that integrates every area of business that touches the customer - namely marketing, sales, and customer service - by coordinating people, internal processes and technology.
A traditional CRM system typically focuses on reengineering the transactions and workflows to make them customer centric, however to gain competitive advantage it is equally important to analyze the business data for locating patterns in customer behavior that would help in customer acquisition, retention, and building customer loyalty. This can be achieved by coupling Data Analytics with traditional CRM.

The tremendous leaps in storage and computational power have made Data Analytics emerge as a powerful business tool that unleashes the power in your data across the organization for better decision making. Data Analytics combines data warehousing, data mining and mathematical modeling concepts to decipher previously unknown, actionable information from business data. Because the basis of data analytics is data - the facts about what has already happened in the organization - data analytics enables the organization to leverage the experience to make better decisions today.

This tutorial provides an up-to-date introduction to the increasingly important field of "Analytical CRM", whose goal is provide a quantitative basis for making CRM decisions - thus leading the transition from customer relationship as an art to a science.

Participant Profile

This tutorial will help participants understand the key architectural and design issues related to building a data warehouse, data mining, and building and deploying analytical models. Two categories of people will benefit from this tutorial:

(i) Industry practitioners, including Chief Information Officers, Project Managers, Business Analysts, and Technical Architects who would be involved in integrating data analytic tools with customer relationship management.
(ii) Researchers and graduate students in the areas of databases, data mining, and electronic commerce, who are interested in learning about problems in an increasingly important application area.

Key Benefits
This tutorial will help the participants understand the following:
· What are the benefits of applying data analytic techniques to customer relationship management?
· What is the typical architecture and design of a data analytic application?
· How does one build a data warehouse for CRM?
· What are the various data mining techniques that can be applied?
· How does one build and deploy analytic models for customer behavior, customer service requirements etc.?
· Are there any off the shelves solutions available in this area?
· What are the human resource and management issues involved in implementing a data analytic application?

Tutorial Syllabus
1. Introduction
a. What is CRM
b. What is data analytics
c. What CRM business needs does data analytics address
d. Why is now the right time for it
e. Return on investment (ROI) expectations
2. Data analytics for CRM
a. Architecture
b. Data warehousing
c. CRM model building
d. Model deployment
e. Report delivery
3. Technology analysis
a. Architecture and standards
b. Build vs. buy decisions
c. Hardware selection
d. Software selection
e. Keeping an eye on emerging players and standards
4. Team and Process Issues
a. Roles
b. Responsibilities
c. Processes
d. Critical dependencies
5. Building a CRM data warehouse
a. Data model
b. Data warehouse sizing
c. Data extraction and loading
d. Data warehouse performance tuning
6. Mining the data warehouse
a. Hypothesis formulation & testing
b. Classification
c. Clustering
d. Sequence analysis
e. Linkage analysis
7. Analytical models for customer behavior
a. Customer segmentation
b. Customer acquisition
c. Customer retention
d. Loyalty building
e. Lifetime value analysis
8. Analytical models for marketing
a. Advertisement
b. Promotions
c. Direct mail
d. E-mail
e. Campaigns
f. Cross-sell
g. Up-sell
9. Analytical models for customer service
a. Request categorization
b. Matching request to expert
c. Batched request processing
d. Customer service as a means of follow-up marketing
10. CRM model deployment issues
a. Off-line deployment
b. On-line (real time) deployment
c. Scalability
d. Reliability
11. Building a CRM reporting system
a. Report definition
b. Delivery requirements
c. Scalability & availability requirements
d. Report building
e. Change management
12. Case study: Analytical CRM for a financial services company
13. Conclusion and Discussion

About the Speaker

Jaideep Srivastava received his B.Tech. from the Indian Institute of Technology, Kanpur, India, in 1983, and M.S. and Ph.D. from the University of California - Berkeley in 1985 and 1988, respectively. Since 1988 he has been on the faculty of the University of Minnesota, where is a Professor. For over 15 years he has been active as a researcher, educator, and consultant in the areas of databases, data mining, and multimedia. He has established and led a database and multimedia research laboratory, where 16 people have received their doctorate and 37 people have received their masters. Over half of the Ph.D.s have gone on to become faculty members, both nationally and internationally.

Throughout his career Dr. Srivastava has had an active collaboration with the industry, both for collaborative research and technology transfer. Specifically, he has collaborated with Honeywell, IBM, Fujitsu, and Apertus/Carleton for research purposes. In addition, he has been active in transferring technology to the Army, Air Force, and Minnesota Department of Transportation. Between 1999 and 2001 Dr. Srivastava was on leave from the University of Minnesota, during which period he has spent time at Amazon.com (www.amazon.com) as the Chief Data Mining Architect, and at Yodlee Inc. (www.yodlee.com) as Director of Data Analytics. This wide-ranging industry experience has provided Dr. Srivastava a unique perspective on the application of various computer science ideas in the industry.

Dr. Srivastava is an often-invited participant in technical as well as technology strategy forums. He has given more than a hundred talks in various industry, academic, and government forums. He has organized and served on the program committee of a number of conferences. He is currently an associate editor for the IEEE Transactions on Knowledge & Data Engineering and a guest editor for the Data Mining & Knowledge Discovery Journal. The federal government has solicited his opinion on computer science research as an expert witness. He has served in an advisory role to the governments of India and Chile on various software technologies. A sample of Dr. Srivastava's research work is available at
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/s/Srivastava:Jaideep.html.

Tutorial 3:
"Data Clustering Analysis, from Simple Groupings to Scalable Clustering with Constraints"

Tutorial Abstract

Cluster analysis is the automatic identification of groups of similar objects. There have been many works on cluster analysis but we are now witnessing a significant resurgence of interest in new clustering techniques. Indeed, discovering distributions of patterns in either numerical or categorical data is relevant for many application domains. Scalability and high dimensionality are not the only focus of the recent research in clustering analysis. Indeed, it is getting difficult to keep track of all the new clustering strategies, their advantages and shortcomings.

This tutorial surveys clustering - what it is, its utility, the various approaches to cluster and outlier discovery - with a particular focus on the important recent advances in the field. Clustering is an unsupervised classification process that is fundamental to data mining. Many data mining queries are concerned either with how the data objects are grouped or which objects could be considered remote from natural groupings. We discover this through clustering which attempts to group data objects by maximizing inter-group similarity and minimizing intra-group similarity. The various approaches to clustering, their basic concepts, principles and assumptions as well as their efficiency and effectiveness in practice
will be presented. Important issues like handling noise, scaling to very large datasets and high dimensionality as well as defining and validating cluster quality are discussed. Finally we cover core issues such as the discovery of clusters at different resolutions, cluster analysis in the presence of constraints, the problem of parameter setting and whether clustering can be automated without the need for input parameters.

About the Speakers

Osmar R. Zaiane, now Assistant Professor at the University of Alberta, Canada, received a Masters degree in Electronics (DEA) from the University of Paris XI, France, and another Masters in Computer Science from Laval University, Canada. He received his Ph.D. from Simon Fraser University under the supervision of Dr. Jiawei Han. His Ph.D. work concentrated on data mining from the Web and multimedia repositories. He has published more than 40 papers in international conferences and journals. Osmar Zaiane was the co-chair of the First and Second International Workshop on Multimedia Data Mining (MDM/KDD) held in conjunction with ACM SIGKDD 2000 and 2001, and guest editor for the Journal of Intelligent Information Systems. He has been an ACM Member since 1986.

Andrew Foss, has a BA and MA in Physics from Oxford University and is currently a graduate student in computing science at the University of Alberta specializing in Clustering. He has his own successful software development company and has published in astronomy and computer science. He has developed several new clustering algorithms in association with Dr. Zaiane.