NEW !!!

Program

April 16 (Mon)
0830 - 0900	Registration, Grand Ballroom-Fanling
0900 - 1030	Workshop I, Statistical Techniques	Tutorial I	Tutorial II	Tutorial III
0900 - 1030	Orchid-Peony	Orchid-Camomile	Orchid-Magnolia	Orchid-Rose
1030 - 1100	Coffee Break, Orchid-Foyer
1100 - 1215	Workshop I, Statistical Techniques	Tutorial I	Tutorial II	Tutorial III
1100 - 1215	Orchid-Peony	Orchid-Camomile	Orchid-Magnolia	Orchid-Rose

1400 - 1530	Workshop II, Mining Spatial	Tutorial IV	Tutorial V
1400 - 1530	Orchid-Peony	Orchid-Camomile	Orchid-Magnolia
1530 - 1600	Coffee Break, Orchid-Foyer
1600 - 1715	Workshop II, Mining Spatial	Tutorial IV	Tutorial V
1600 - 1715	Orchid-Peony	Orchid-Camomile	Orchid-Magnolia

1800 - 2000	Reception, Harbour Room I & II


April 17 (Tue)
0800 - 0845	Registration, Grand Ballroom-Fanling
0845 - 0900	Conference Opening, Grand Ballroom-Fanling
0900 - 1000	Keynote Presentation, Mining E-Commerce Date, Ronny Kohavi
0900 - 1000	Grand Ballroom-Fanling
1000 - 1030	Session 1A	Session 1B	Session 1C
1000 - 1030	Orchid-Rose	Orchid-Peony	Orchid-Magnolia
1030 - 1100	Coffee Break, Orchid-Foyer
1100 - 1230	Session 2A	Session 2B	Session 2C
1100 - 1230	Orchid-Rose	Orchid-Peony	Orchid-Magnolia
1230 - 1400	Lunch, Grand Ballroom-Tai po/Shek-o
1400 - 1500	Keynote Presentation, Incompleteness in Data Mining, H.V. Jagadish
1400 - 1500	Grand Ballroom-Fanling
1500 - 1530	Session 3A	Session 3B	Session 3C
1500 - 1530	Orchid-Rose	Orchid-Peony	Orchid-Magnolia
1530 - 1600	Coffee Break, Orchid-Foyer
1600 - 1730	Session 4A	Session 4B	Session 4C
1600 - 1730	Orchid-Rose	Orchid-Peony	Orchid-Magnolia

1900 - 2130	Banquet, Orchid Room


April 18 (Wed)
0900 - 1000	Keynote Presentation, Seamless Integration of Data Mining, Hongjun Lu
0900 - 1000	Grand Ballroom-Fanling
1000 - 1030	Session 5A	Session 5B	Session 5C
1000 - 1030	Orchid-Rose	Orchid-Peony	Orchid-Magnolia
1030 - 1100	Coffee Break, Orchid-Foyer
1100 - 1230	Session 6A	Session 6B	Session 6C, Industry Track
1100 - 1230	Orchid-Rose	Orchid-Peony	Orchid-Magnolia
1230 - 1400	Lunch, Grand Ballroom-Tai po/Shek-o
1400 - 1530	Session 7A	Session 7B	Session 7C, Industry Track
1400 - 1530	Orchid-Rose	Orchid-Peony	Orchid-Magnolia
1530 - 1600	Coffee Break, Orchid-Foyer
1600 - 1715	Session 8A	Session 8B	Session 8C, Industry Track
1600 - 1715	Orchid-Rose	Orchid-Peony	Orchid-Magnolia

On-site registration starts at 8:30am, April 16.
Tutorials and Workshop on April 16.
Conference reception at 6:00pm, April 16.
Plenary session on April 17-18.

Industrial track on April 18.
Conference banquet at 7:00pm, April 17.
Demonstrations and exhibitions on April 17-18.

All plenary papers are 25 minutes (20 minutes presentation and 5 minutes discussion) except papers marked * which are 15 minutes (12 minutes presentation and 3 minutes discussion). All 8 plenary sessions each consist of three parallel streams. Industrial track presentations are 30 minutes each.

Monday 16 April 2001

0830-0900
Registration

Venue :Grand Ballroom-Fanling

0900-1215
Morning Parallel Tutorial Program

(Coffee break at 10:30)

See the Tutorial Page for details.

An Introduction to MARS (Tutorial I)

Dr Dan Steinberg, CEO of Salford Systems, USA

Venue : Orchid-Camomile

Static and Dynamic Data Mining Using Advanced Machine Learning Methods (Tutorial II)

Professor Ryszard S. Michalski, George Mason University, USA

Venue: Orchid-Magnolia

Sequential Pattern mining: From Shopping History Analysis to Weblog Mining and DNA Mining (Tutorial III)

Professor Jiawei Han and Jian Pei, Simon Fraser University, Canada

Venue : Orchid-Rose

1400-1715
Afternoon Parallel Tutorial Program

(Coffee break at 15:30)

See the Tutorial Page for details.

Recent Advances in Data Mining Algorithms for Large Databases (Tutorial IV)

Dr Rajeev Rastogi and Dr Kyuseok Shim, USA and Korea

Venue : Orchid-Camomile

Web Mining for E-Commerce (Tutorial V)

Professor Jaideep Srivastava, University of Minnesota, USA
Venue : Orchaid-Magnolia

Workshop Program

See the Workshop Page for details.

0900-1215
Workshop on Statistical Techniques in Data Mining with Applications (Workshop I)

Venue : Orchid-Peony

(Coffee break at 10:30)

1400-1715
Workshop on Mining Spatial and Temporal Data (Workshop II)

Venue: Orchid-Peony

(Coffee break at 15:30)

1800-2000
PAKDD 2001 Reception at Conference Hotel

Venue : Harbour Room I & II

Tuesday 17 April 2001

0800-0845
Registration

Venue : Grand Ballroom-Fanling

0845-0855
Conference Opening

Venue : Grand Ballroom-Fanling

Guest of Honor : Mrs Sarah Kwok, Deputy Commissioner for Innovation and Technology

Chair : David Cheung

0855-0900 Welcoming remarks from Conference Chair

Jiawei Han

0900-1000
Keynote Presentation

Venue : Grand Ballroom-Fanling

Mining E-commerce Data: The Good, the Bad, and the Ugly

Ronny Kohavi, Blue Martini Software

Chair : Jiawei Han

Abstract: Electronic commerce provides all the right ingredients for successful data mining (the Good). Web logs, however, are at a very low granularity level, and attempts to mine e-commerce data using only web logs often result in little interesting insight (the Bad). Getting the data into minable formats requires significant pre-processing and data transformations (the Ugly). In the ideal e-commerce architecture, high level events are logged, transformations are automated, and data mining results can easily be understood by business people who can take action quickly and efficiently. Lessons, stories, and challenges based on mining real data at Blue Martini Software will be presented.

Biography: Ronny Kohavi is well known for his work on the Silicon Graphics MineSet project for data mining and visualization. He joined Silicon Graphics after getting a Ph.D. in Machine Learning from Stanford University, where he led the MLC++ project, the Machine Learning library in C++. Kohavi co-chaired the KDD 99 industrial track and KDD Cup 2000. He co-edited a special issue of the journal Machine Learning on Applications of Machine Learning and the special issue of the Data Mining and Knowledge Discovery journal on Applications of Data Mining to Electronic Commerce.

1000-1030
Session 1 Stream A: Text Mining

Chair : Huan Liu

Venue : Orchid-Rose

Efficient Algorithms for Concept Space Construction

C. Y. Ng, Joseph Lee, Felix Cheung, Ben Kao, David Cheung (Hong Kong)

Session 1 Stream B: Data Mining Tools

Chair : Ning Zhong

Venue: Orchid-Peony

A Toolbox Approach to Flexible and Efficient Data Mining

Ole M. Nielson, Peter Christen, Markus Hegland, Tatiana Semenova, Timothy Hancock (Australia)

Session 1 Stream C: Advanced Topics

Chair : Rohan Baxter

Venue: Orchid-Magnolia

Knowledge Acquisition from Both Human Expert and Data

Takuya Wada, Hiroshi Motoda, Takashi Washio (Japan)

1030-1100 Coffee Break

1100-1230
Session 2 Stream A: Text and Web Mining

Chair : Lizhu Zhou

Venue: Orchid-Rose

Applying Pattern Mining to Web Information Extraction

Chia-Hui Chang, Shao-Chen Lui, Yen-Chin Wu (Taiwan)

Empirical Study of Recommender Systems Using Linear Classifiers

Vijay S. Iyengar, Tong Zhang (USA)

Hierarchical Classification of Documents with Error Control

Chun-hung Cheng, Jian Tang, Ada Wai-chee Fu, Irwin King (Hong Kong)

A Characterized Rating Recommend System*

Yao-Tsung Lin, Shian-Shyong Tseng (Taiwan)

Session 2 Stream B: Association Rules

Chair : Hiroshi Motoda

Venue: Orchid-Peony

Mining Optimal Class Association Rule Set

Jiuyong Li, Hong Shen, Rodney Topor (Australia)

Generating Frequent Patterns with the Frequent Pattern List

Fan-Chen Tseng, Ching-Chi Hus (Taiwan)

User-Defined Association Mining

Ke Wang, Yu He (Canada)

Direct and Incremental Computing of Maximal Covering Rules*

Marzena Kryszkiewicz (Poland)

Session 2 Stream C: Advanced Topics

Chair : Markus Hegland

Venue : Orchid-Magnolia

An Efficient Data Compression Approach to the Classification Task

Claudia Diamantini, Maurizio Panti (Italy)

A Scalable Algorithm for Rule Post-Pruning of Large Decision Trees

Trong Dung Nguyen, Tu Bao Ho, Hiroshi Shimodaira (Japan)

Rule Reduction Over Numerical Attributes in Decision Trees Using Multilayer Perceptron

DaeEun Kim, Jaeho Lee (United Kingdom)

Neighborhood Dependencies for Prediction*

Renaud Bassee, Jef Wijsen (Belgium)

1230-1400
Lunch

Venue : Grand Ballroom-Taipo & Shek-O

1400-1500
Keynote Presentation

Venue : Grand Ballroom-Fanling

Incompleteness in Data Mining

H. V. Jagadish, University of Michigan

Chair : Graham Williams

Abstract: Database technology, as well as the bulk of data mining technology, is founded upon logic, with absolute notions of truth and falsehood, at least with respect to the data set. Patterns are discovered exhaustively, with carefully engineered algorithms devised to determine all patterns in a data set that belong to a certain class. For large data sets, many such data mining techniques are extremely expensive, leading to considerable research towards solving these problems more cheaply.
We argue that the central goal of data mining is to find SOME interesting patterns, and not necessarily ALL of them. As such, techniques that can find most of the answers cheaply are clearly more valuable than computationally much more expensive techniques that can guarantee completeness. In fact, it is probably the case that patterns that can be found cheaply are indeed the most important ones.
Furthermore, knowledge discovery can be the most effective with the human analyst heavily involved in the endeavor. To engage a human analyst, it is important that data mining techniques be interactive, hopefully delivering (close to) real time responses and feedback. Clearly then, extreme accuracy and completeness (i.e., finding all patterns satisfying some specified criteria) would almost always be a luxury. Instead, incompleteness (i.e., finding only some patterns) and approximation would be essential.
We exemplify this discussion through the notion of fascicles. Often many records in a database share similar values for several attributes. If one is able to identify and group together records that share similar values for some - even if not all - attributes, one can both obtain a more parsimonious representation of the data, and gain useful insight into the data from a mining perspective. Such groupings are called fascicles. We explore the relationship of fascicle-finding to association rule mining, and experimentally demonstrate the benefit of incomplete but inexpensive algorithms. We also present analytical results demonstrating both the limits and the benefits of such incomplete algorithms.

Biography: Professor Jagadish obtained his Ph.D. from Stanford and spent several years as head of the database department at AT&T. Prior to Michigan he was at the University of Illinois. His research spans many aspects of database systems, particularly in the context of the internet and XML.

1500-1530
Session 3 Stream A: Text Mining

Chair : Michael Ng

Venue : Orchid-Rose

Topic Detection, Tracking and Trend Analysis Using Self-Organizing Neural Networks*

K. Rajaraman, Ah-Hwee Tan (Singapore)

Automatic Hypertext Construction Through a Text Mining Approach by Self-Organizing Maps*

Hsin-Chang Yang, Chung-Hong Lee (Taiwan)

Session 3 Stream B: Association Rules

Chair : Guozhu Dong

Venue : Orchid-Peony

Towards Efficient Data Re-Mining (DRM)*

Jiming Liu, Jian Yin (Hong Kong)

Data Allocation Algorithm for Parallel Association Rule Discovery*

Anna M. Manning, John A. Keane (United Kingdom)

Session 3 Stream C: Advanced Topics

Chair : Takao Terano

Venue : Orchid-Magnolia

A Similarity Indexing Method for the Data Warehousing---Bit-wise Indexing Method

Wei-Chou Chen, Shian-Shyong Tseng, Lu-Ping Chang, Mon-Fong Jiang (Taiwan)

1530-1600 Coffee Break

1600-1730
Session 4 Stream A: Text and Web Mining

Chair : Aoying Zhou

Venue : Orchid-Rose

Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

Eui-Hong Han, George Karypis, Vipin Kumar (USA)

Predictive Self-Organizing Networks for Text Categorization

Ah-Hwee Tan (Singapore)

Meta-Learning Models for Automatic Textual Document Categorization

Kwok-Yin Lai, Wai Lam (Hong Kong)

Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents*

Tetsuhiro Miyahara, Takayoshi Shoudai, Tomoyuki Uchida, Kenichi Takahashi, Hiroaki Ueda (Japan)

Session 4 Stream B: Classification

Chair : Vijay Iyengar

Venue : Orchid-Peony

Direct Domain Knowledge Inclusion in the PA3 Rule Induction Algorithm

Pedro de Almeida (Portugal)

Combining the Strength of Pattern Frequency and Distance for Classification

Jinyan Li, Kotagiri Ramamohanarao, Guozhu Dong (Australia)

Optimizing the Induction of Alternating Decision Trees

Bernhard Pfahringer, Geoffrey Holmes, Richard Kirkby (New Zealand)

Building Behaviour Knowledge Space to Make Classification Decision*

Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao (Australia)

Session 4 Stream C: Feature Selection

Chair : Vincent Ng

Venue : Orchid-Magnolia

Feature Selection for Temporal Health Records

Rohan A. Baxter, Graham J. Williams, Hongxing He (Australia)

Boosting the Performance of Nearest Neighbour Methods with Feature Selection

Shlomo Geva (Australia)

Feature Selection for Meta-learning

Alexandros Kalousis, Melanie Hilario (Switzerland)

Interactive Construction of Decision Trees*

Jianchao Han, Nick Cercone (Canada)

1900
Conference Banquet

Venue : Orchid Room

Wednesday 18 April 2001

0900-1000
Keynote Presentation

Venue : Grand Ballroom - Fanling

Seamless Integration of Data Mining with DBMS and Applications

Hongjun Lu, The Hong Kong University of Science and Technology

Chair : Qing Li

Abstract: Data mining has been widely recognized as a powerful tool for exploring added value from data accumulated in the daily operations of an organization. A large number of data mining algorithms have been developed during the past decade. Those algorithms can be roughly divided into two groups. The fist group of techniques, such as classification, clustering, prediction and deviation analysis, has been studied for a long time in machine learning, statistics, and other fields. The second group of techniques, such as association rule mining, mining in spatial-temporal databases and mining from the Web, addresses problems related to large amounts of data. Most classical algorithms in the first group assume that the data to be mined is somehow available in memory. Although initial effort in data mining has concentrated on making those algorithms scalable with respect to large volume of data, most of those scalable algorithms, even developed by database researchers, are still stand-alone. It is often assumed that data is available in desired forms, without considering the fact that most organizations store their data in databases managed by database management systems (DBMS). As such, most data mining algorithms can only be loosely coupled with data infrastructures in organizations and are difficult to infuse into existing mission-critical applications. Seamlessly integrating data mining techniques with database applications and database management systems remains an open problem.
In this paper, we propose to tackle the problem of seamless integration of data mining with DBMS and applications from three directions. First, with the recent development of database technology, most database management systems have extended their functionality in data analysis. Such capability should be fully explored to develop DBMS-awre data mining algorithms. Ideally, data mining algorithms can be fully implemented using DBMS supported functions so that they become database application themselves. Second, major difficulties in integrating data mining with applications are algorithm selection and parameter setting. Reducing or eliminating mining parameters as much as possible and developing automatic or semi-automatic mining algorithm selection techniques will greatly increase the application friendliness of data mining systems. Lastly, standardizing the interface among databases, data mining algorithms and applications can also facilitate the integration to certain extent.

Biography: Professor Lu is a trustee of the VLDB Endowment, a member of the ACM SIGMOD Advisory Board and serves as a member of the ACM SIGKDD International Liaisons. He is the chair of the steering committee of the International Conference on Web-Age Information Management (WAIM), and the co-chair of the steering committee of Pacific-Asia Conference of Knowledge Discovery and Data Mining (PAKDD). His research interests are in data/knowledge base management systems with emphasis on query processing and optimization, physical database design and database performance. His recent research work includes data quality, data warehousing and data mining. He is also interested in development of Internet-based database applications and electronic business systems. He has been publishing extensively in important international database conferences and journals such as SIGMOD, VLDB, ICDE, EDBT, TKDE, VLDB Journal.

1000-1030
Session 5 Stream A: Clustering

Chair : Ah-Hwee Tan

Venue : Orchid-Rose

Criteria on Proximity Graphs for Boundary Extraction and Spatial Clustering*

Vladimir Estivill-Castro, Ickjai Lee, Alan Murray (Australia)

A Hybrid Approach to Clustering in Very Large Databases*

Aoying Zhou, Weining Qian, Hailei Qian, Jin Wen, Shuigeng Zhou, Ye Fan (China)

Session 5 Stream B: Advanced Topics

Chair : Ada Fu

Venue : Orchid-Peony

An Improved Learning Algorithm for Augmented Naive Bayes*

Huajie Zhang, Charles X. Ling (Canada)

Generalised RBF Networks Trained using IBL Algorithm for Mining Symbolic Data*

Liviu Vladutu, Stergios Papadimitriou, Severina Mavroudi, Anastassios Bezerianos (Greece)

Session 5 Stream C: Applications and Tools

Chair : Jeffrey Yu

Venue : Orchid-Magnolia

Seabreeze Prediction Using Bayesian Networks: A Case Study*

Russell J Kennett, Kevin B Korb, Ann E Nicholson (Australia)

Semi-supervised Learning in Medical Image Database*

C. H. Li, P. C. Yuen (Hong Kong)

1030-1100 Coffee Break

1100-1230
Session 6 Stream A: Sequence Mining

Chair : Howard Hamilton

Venue : Orchid-Rose

Generating Concept Hierarchies/Networks: Mining Additional Semantics in Relational Data

T. Y. Lin (USA)

Scalable Hierarchical Clustering Method for Sequences of Categorical Values

Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz (Poland)

Mining Sequence Patterns from Wind Tunnel Experimental Data for Flight Control

Zhenyu Liu, Wesley W. Chu, Adam Huang, Chris Folk, Chih-Ming Ho (USA)

Sequential Index Structure for Content-Based Retrieval*

Maciej Zakrzewicz (Poland)

Session 6 Stream B: Applications and Tools

Chair : Chun Hung Li

Venue : Orchid-Peony

iJADE eMiner---A Web-based Mining Agent based on Intelligent Java Agent Development Environment (iJADE) on Internet Shopping

Raymond S. T. Lee, James N. K. Liu (Hong Kong)

Semantic Expectation-based Causation Knowledge Extraction: A Study on Hong Kong Stock Movement Analysis

Boon-Toh Low, Ki Chan, Lei-Lei Choi, Man-Yee Chin, Sin-Ling Lay (Hong Kong)

Determining Progression in Glaucoma Using Visual Fields

Andrew Turpin, Eibe Frank, Mark Hall, Ian H. Witten, Chris A. Johnson (New Zealand)

On Application of Rough Data Mining Methods to Automatic Construction of Student Models*

Feng-Hsu Wang, Shiou-Wen Hung (Taiwan)

Session 6 Stream C: Industry Track

Chair : Joseph Fong

Venue : Orchid-Magnolia

Using Internet Survey as Mechanisms of Customer Behavior Prediction

Dr. Dennis Peng , Founder, SuperPoll.net, Taiwan

Improving the web design - mining web data at CITYJOB.com (Case Study)

Dr H. P. Lo, Associate Professor, Department. of Management Science, City University, Hong Kong

Building a Credit Scorecard With SAS Enterprise Miner

Dr. K W Cheng, Consultant, SAS Institute Ltd., Hong Kong

1230-1400
Lunch

Venue : Grand Ballroom-Taipo & Shek-O

1400-1530
Session 7 Stream A: Clustering

Chair : Charles Ling

Venue : Orchid-Rose

Efficient Hierarchical Clustering Algorithms using Partially Overlapping Partitions

Manoranjan Dash, Huan Liu (Singapore)

A Rough Set-based Clustering Method with Modification of Equivalence Relations*

Shoji Hirano, Tomohiro Okuzaki, Yutaka Hata, Shusaku Tsumoto, Kouhei Tsumoto (Japan)

Importance of Individual Variables in the k-Means Algorithm*

Juha Vesanto (Finland)

Learning Bayesian Networks with Hidden Variables Using the Combination of EM and Evolutionary Algorithms*

Tian Fengzhan, Lu Yuchang, Shi Chunyi (China)

Session 7 Stream B: Spatial and Temporal Mining

Chair : Joshua Z Huang

Venue : Orchid-Peony

Patterns Discovery Based on Time-Series Decomposition

Jeffrey Xu Yu, Michael K. Ng, Joshua Zhexue Huang (Hong Kong)

Temporal Data Mining Using Hidden Markov-Local Polynomial Models

Weiqiang Lin, Mehmet A. Orgun, Graham J. Williams (Australia)

The S^2-Tree: An Index Structure for Subsequence Matching of Spatial Objects

Haixun Wang, Chang-Shin Perng (USA)

Micro Similarity Queries in Time Series Database*

Xiao-ming Jin, Yuchang Lu, Chunyi Shi (China)

Session 7 Stream C: Industry Track

Chair : Dennis Peng

Venue : Orchid-Magnolia

The Usage of Segmentation, Association and Link Analysis in Fraud Detection for Insurance

Mr. Dick Cheung, Principal Consultant, SAS Institute Ltd., Australia

Uncover Business Intelligence in Any Customer Database

Ms. Lucy Kwan, Managing Partner, Smartal Solutions Ltd., Hong Kong

Data Mining within the Financial Services Industry (Case Study - Personal Loans)

Mr. Steven Parker, Head CRM (Customer Sales & Service), Standard Chartered Bank, Hong Kong

1530-1600 Coffee Break

1600-1715
Session 8 Stream A: Concept Hierarchies

Chair : Siu Ming Yiu

Venue : Orchid-Rose

Concept Approximation in Concept Lattice

Keyun Hu, Yuefei Sui, Yuchang Lu, Ju Wang, Chunyi Shi (China)

FFS---An I/O-Efficient Algorithm for Mining Frequent Sequences

Minghua Zhang, Ben Kao, Chi-Lap Yip, David Cheung (Hong Kong)

Representing Large Concept Hierarchies using Lattice Data Structure

Yanee Kachai, Kitsana Waiyamai (Thailand)

Session 8 Stream B: Interestingness

Chair : Kevin Korb

Venue : Orchid-Peony

Efficient Mining of Niches and Set Routines

Guozhu Dong, Kaustubh Deshpande (USA)

Evaluation of Interestingness Measures for Ranking Discovered Knowledge

Robert J. Hilderman, Howard J. Hamilton (Canada)

Peculiarity Oriented Mining and Its Application for Knowledge Discovery in Amino-acid Data

Ning Zhong, Muneaki Ohshima, Setsuo Ohsuga (Japan)

Session 8 Stream C: Industry Track

Chair : H P Lo

Venue : Orchid-Magnolia

Enterprise-Level Business Intelligence and Data-Warehousing

Mr. Tom Lim, IT Evangelist, Manager, Sybase Inc., Hong Kong

Online Marketing Support using Online Analytical Mining Path Traversal Patterns

Dr. Joseph Fong, Associate Professor, City University of Hong Kong, Director, Universal Data Warehousing Ltd, Hong Kong, and Irene Kwan, H K Wong