NEW !!!

 

 

Program

April 16 (Mon)
0830 - 0900
0900 - 1030
1030 - 1100
1100 - 1215
1400 - 1530
1530 - 1600
1600 - 1715
1800 - 2000
         
         
April 17 (Tue)
0800 - 0845
0845 - 0900
0900 - 1000
1000 - 1030  
1030 - 1100
1100 - 1230  
1230 - 1400
1400 - 1500
1500 - 1530  
1530 - 1600
1600 - 1730  
1900 - 2130
         
         
April 18 (Wed)
0900 - 1000
1000 - 1030
1030 - 1100
1100 - 1230
1230 - 1400
1400 - 1530
1530 - 1600
1600 - 1715
  • On-site registration starts at 8:30am, April 16.
  • Tutorials and Workshop on April 16.
  • Conference reception at 6:00pm, April 16.
  • Plenary session on April 17-18.
  • Industrial track on April 18.
  • Conference banquet at 7:00pm, April 17.
  • Demonstrations and exhibitions on April 17-18.

All plenary papers are 25 minutes (20 minutes presentation and 5 minutes discussion) except papers marked * which are 15 minutes (12 minutes presentation and 3 minutes discussion). All 8 plenary sessions each consist of three parallel streams. Industrial track presentations are 30 minutes each.

 
Monday 16 April 2001
 
0830-0900

Registration

Venue :Grand Ballroom-Fanling

 
0900-1215

Morning Parallel Tutorial Program

(Coffee break at 10:30)

 
  See the Tutorial Page for details.
 
  An Introduction to MARS (Tutorial I)
  Dr Dan Steinberg, CEO of Salford Systems, USA
  Venue : Orchid-Camomile
 
  Static and Dynamic Data Mining Using Advanced Machine Learning Methods (Tutorial II)
  Professor Ryszard S. Michalski, George Mason University, USA
  Venue: Orchid-Magnolia
 
  Sequential Pattern mining: From Shopping History Analysis to Weblog Mining and DNA Mining (Tutorial III)
  Professor Jiawei Han and Jian Pei, Simon Fraser University, Canada
  Venue : Orchid-Rose
 
1400-1715

Afternoon Parallel Tutorial Program

(Coffee break at 15:30)

 
  See the Tutorial Page for details.
 
  Recent Advances in Data Mining Algorithms for Large Databases (Tutorial IV)
  Dr Rajeev Rastogi and Dr Kyuseok Shim, USA and Korea
  Venue : Orchid-Camomile
 
  Web Mining for E-Commerce (Tutorial V)
  Professor Jaideep Srivastava, University of Minnesota, USA
  Venue : Orchaid-Magnolia
 
  Workshop Program
 
  See the Workshop Page for details.
 
0900-1215

Workshop on Statistical Techniques in Data Mining with Applications (Workshop I)

Venue : Orchid-Peony

(Coffee break at 10:30)

 
1400-1715

Workshop on Mining Spatial and Temporal Data (Workshop II)

Venue: Orchid-Peony

(Coffee break at 15:30)

 
1800-2000

PAKDD 2001 Reception at Conference Hotel

Venue : Harbour Room I & II

 
Tuesday 17 April 2001
 
0800-0845

Registration

Venue : Grand Ballroom-Fanling

 
0845-0855

Conference Opening

Venue : Grand Ballroom-Fanling

 
  Guest of Honor : Mrs Sarah Kwok, Deputy Commissioner for Innovation and Technology
  Chair : David Cheung
 
0855-0900 Welcoming remarks from Conference Chair
  Jiawei Han
 
0900-1000

Keynote Presentation

Venue : Grand Ballroom-Fanling

 
  Mining E-commerce Data: The Good, the Bad, and the Ugly
  Ronny Kohavi, Blue Martini Software
  Chair : Jiawei Han
  Abstract: Electronic commerce provides all the right ingredients for successful data mining (the Good).  Web logs, however, are at a very low granularity level, and attempts to mine e-commerce data using only web logs often result in little interesting insight (the Bad).  Getting the data into minable formats requires significant pre-processing and data transformations (the Ugly).  In the ideal e-commerce architecture, high level events are logged, transformations are automated, and data mining results can easily be understood by business people who can take action quickly and efficiently. Lessons, stories, and challenges based on mining real data at Blue Martini Software will be presented.
  Biography: Ronny Kohavi is well known for his work on the Silicon Graphics MineSet project for data mining and visualization. He joined Silicon Graphics after getting a Ph.D. in Machine Learning from Stanford University, where he led the MLC++ project, the Machine Learning library in C++. Kohavi co-chaired the KDD 99 industrial track and KDD Cup 2000. He co-edited a special issue of the journal Machine Learning on Applications of Machine Learning and the special issue of the Data Mining and Knowledge Discovery journal on Applications of Data Mining to Electronic Commerce.
 
1000-1030

Session 1 Stream A: Text Mining

Chair : Huan Liu

Venue : Orchid-Rose

 
  Efficient Algorithms for Concept Space Construction
  C. Y. Ng, Joseph Lee, Felix Cheung, Ben Kao, David Cheung (Hong Kong)
 
 

Session 1 Stream B: Data Mining Tools

Chair : Ning Zhong

Venue: Orchid-Peony

 
  A Toolbox Approach to Flexible and Efficient Data Mining
  Ole M. Nielson, Peter Christen, Markus Hegland, Tatiana Semenova, Timothy Hancock (Australia)
 
 

Session 1 Stream C: Advanced Topics

Chair : Rohan Baxter

Venue: Orchid-Magnolia

 
  Knowledge Acquisition from Both Human Expert and Data
  Takuya Wada, Hiroshi Motoda, Takashi Washio (Japan)
 
1030-1100 Coffee Break
 
1100-1230

Session 2 Stream A: Text and Web Mining

Chair : Lizhu Zhou

Venue: Orchid-Rose

 
  Applying Pattern Mining to Web Information Extraction
  Chia-Hui Chang, Shao-Chen Lui, Yen-Chin Wu (Taiwan)
 
  Empirical Study of Recommender Systems Using Linear Classifiers
  Vijay S. Iyengar, Tong Zhang (USA)
 
  Hierarchical Classification of Documents with Error Control
  Chun-hung Cheng, Jian Tang, Ada Wai-chee Fu, Irwin King (Hong Kong)
 
  A Characterized Rating Recommend System*
  Yao-Tsung Lin, Shian-Shyong Tseng (Taiwan)
 
 

Session 2 Stream B: Association Rules

Chair : Hiroshi Motoda

Venue: Orchid-Peony

 
  Mining Optimal Class Association Rule Set
  Jiuyong Li, Hong Shen, Rodney Topor (Australia)
 
  Generating Frequent Patterns with the Frequent Pattern List
  Fan-Chen Tseng, Ching-Chi Hus (Taiwan)
 
  User-Defined Association Mining
  Ke Wang, Yu He (Canada)
 
  Direct and Incremental Computing of Maximal Covering Rules*
  Marzena Kryszkiewicz (Poland)
 
 

Session 2 Stream C: Advanced Topics

Chair : Markus Hegland

Venue : Orchid-Magnolia

 
  An Efficient Data Compression Approach to the Classification Task
  Claudia Diamantini, Maurizio Panti (Italy)
 
  A Scalable Algorithm for Rule Post-Pruning of Large Decision Trees
  Trong Dung Nguyen, Tu Bao Ho, Hiroshi Shimodaira (Japan)
 
  Rule Reduction Over Numerical Attributes in Decision Trees Using Multilayer Perceptron
  DaeEun Kim, Jaeho Lee (United Kingdom)
 
  Neighborhood Dependencies for Prediction*
  Renaud Bassee, Jef Wijsen (Belgium)
 
1230-1400

Lunch

Venue : Grand Ballroom-Taipo & Shek-O

 
1400-1500

Keynote Presentation

Venue : Grand Ballroom-Fanling

 
  Incompleteness in Data Mining
  H. V. Jagadish, University of Michigan
  Chair : Graham Williams
  Abstract: Database technology, as well as the bulk of data mining technology, is founded upon logic, with absolute notions of truth and falsehood, at least with respect to the data set. Patterns are discovered exhaustively, with carefully engineered algorithms devised to determine all patterns in a data set that belong to a certain class. For large data sets, many such data mining techniques are extremely expensive, leading to considerable research towards solving these problems more cheaply.

We argue that the central goal of data mining is to find SOME interesting patterns, and not necessarily ALL of them. As such, techniques that can find most of the answers cheaply are clearly more valuable than computationally much more expensive techniques that can guarantee completeness. In fact, it is probably the case that patterns that can be found cheaply are indeed the most important ones.

Furthermore, knowledge discovery can be the most effective with the human analyst heavily involved in the endeavor. To engage a human analyst, it is important that data mining techniques be interactive, hopefully delivering (close to) real time responses and feedback. Clearly then, extreme accuracy and completeness (i.e., finding all patterns satisfying some specified criteria) would almost always be a luxury. Instead, incompleteness (i.e., finding only some patterns) and approximation would be essential.

We exemplify this discussion through the notion of fascicles. Often many records in a database share similar values for several attributes. If one is able to identify and group together records that share similar values for some - even if not all - attributes, one can both obtain a more parsimonious representation of the data, and gain useful insight into the data from a mining perspective. Such groupings are called fascicles. We explore the relationship of fascicle-finding to association rule mining, and experimentally demonstrate the benefit of incomplete but inexpensive algorithms. We also present analytical results demonstrating both the limits and the benefits of such incomplete algorithms.

  Biography: Professor Jagadish obtained his Ph.D. from Stanford and spent several years as head of the database department at AT&T. Prior to Michigan he was at the University of Illinois. His research spans many aspects of database systems, particularly in the context of the internet and XML.
 
1500-1530

Session 3 Stream A: Text Mining

Chair : Michael Ng

Venue : Orchid-Rose

 
  Topic Detection, Tracking and Trend Analysis Using Self-Organizing Neural Networks*
  K. Rajaraman, Ah-Hwee Tan (Singapore)
 
  Automatic Hypertext Construction Through a Text Mining Approach by Self-Organizing Maps*
  Hsin-Chang Yang, Chung-Hong Lee (Taiwan)
 
 

Session 3 Stream B: Association Rules

Chair : Guozhu Dong

Venue : Orchid-Peony

 
  Towards Efficient Data Re-Mining (DRM)*
  Jiming Liu, Jian Yin (Hong Kong)
 
  Data Allocation Algorithm for Parallel Association Rule Discovery*
  Anna M. Manning, John A. Keane (United Kingdom)
 
 

Session 3 Stream C: Advanced Topics

Chair : Takao Terano

Venue : Orchid-Magnolia

 
  A Similarity Indexing Method for the Data Warehousing---Bit-wise Indexing Method
  Wei-Chou Chen, Shian-Shyong Tseng, Lu-Ping Chang, Mon-Fong Jiang (Taiwan)
 
1530-1600 Coffee Break
 
1600-1730

Session 4 Stream A: Text and Web Mining

Chair : Aoying Zhou

Venue : Orchid-Rose

  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
  Eui-Hong Han, George Karypis, Vipin Kumar (USA)
 
  Predictive Self-Organizing Networks for Text Categorization
  Ah-Hwee Tan (Singapore)
 
  Meta-Learning Models for Automatic Textual Document Categorization
  Kwok-Yin Lai, Wai Lam (Hong Kong)
 
  Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents*
  Tetsuhiro Miyahara, Takayoshi Shoudai, Tomoyuki Uchida, Kenichi Takahashi, Hiroaki Ueda (Japan)
 
 

Session 4 Stream B: Classification

Chair : Vijay Iyengar

Venue : Orchid-Peony

 
  Direct Domain Knowledge Inclusion in the PA3 Rule Induction Algorithm
  Pedro de Almeida (Portugal)
 
  Combining the Strength of Pattern Frequency and Distance for Classification
  Jinyan Li, Kotagiri Ramamohanarao, Guozhu Dong (Australia)
 
  Optimizing the Induction of Alternating Decision Trees
  Bernhard Pfahringer, Geoffrey Holmes, Richard Kirkby (New Zealand)
 
  Building Behaviour Knowledge Space to Make Classification Decision*
  Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao (Australia)
 
 

Session 4 Stream C: Feature Selection

Chair : Vincent Ng

Venue : Orchid-Magnolia

 
  Feature Selection for Temporal Health Records
  Rohan A. Baxter, Graham J. Williams, Hongxing He (Australia)
 
  Boosting the Performance of Nearest Neighbour Methods with Feature Selection
  Shlomo Geva (Australia)
 
  Feature Selection for Meta-learning
  Alexandros Kalousis, Melanie Hilario (Switzerland)
 
  Interactive Construction of Decision Trees*
  Jianchao Han, Nick Cercone (Canada)
 
1900

Conference Banquet

Venue : Orchid Room

 
Wednesday 18 April 2001
 
0900-1000

Keynote Presentation

Venue : Grand Ballroom - Fanling

  Seamless Integration of Data Mining with DBMS and Applications
  Hongjun Lu, The Hong Kong University of Science and Technology
  Chair : Qing Li
  Abstract: Data mining has been widely recognized as a powerful tool for exploring added value from data accumulated in the daily operations of an organization. A large number of data mining algorithms have been developed during the past decade. Those algorithms can be roughly divided into two groups. The fist group of techniques, such as classification, clustering, prediction and deviation analysis, has been studied for a long time in machine learning, statistics, and other fields. The second group of techniques, such as association rule mining, mining in spatial-temporal databases and mining from the Web, addresses problems related to large amounts of data. Most classical algorithms in the first group assume that the data to be mined is somehow available in memory. Although initial effort in data mining has concentrated on making those algorithms scalable with respect to large volume of data, most of those scalable algorithms, even developed by database researchers, are still stand-alone. It is often assumed that data is available in desired forms, without considering the fact that most organizations store their data in databases managed by database management systems (DBMS). As such, most data mining algorithms can only be loosely coupled with data infrastructures in organizations and are difficult to infuse into existing mission-critical applications. Seamlessly integrating data mining techniques with database applications and database management systems remains an open problem.

In this paper, we propose to tackle the problem of seamless integration of data mining with DBMS and applications from three directions. First, with the recent development of database technology, most database management systems have extended their functionality in data analysis. Such capability should be fully explored to develop DBMS-awre data mining algorithms. Ideally, data mining algorithms can be fully implemented using DBMS supported functions so that they become database application themselves. Second, major difficulties in integrating data mining with applications are algorithm selection and parameter setting. Reducing or eliminating mining parameters as much as possible and developing automatic or semi-automatic mining algorithm selection techniques will greatly increase the application friendliness of data mining systems. Lastly, standardizing the interface among databases, data mining algorithms and applications can also facilitate the integration to certain extent.

  Biography: Professor Lu is a trustee of the VLDB Endowment, a member of the ACM SIGMOD Advisory Board and serves as a member of the ACM SIGKDD International Liaisons. He is the chair of the steering committee of the International Conference on Web-Age Information Management (WAIM), and the co-chair of the steering committee of Pacific-Asia Conference of Knowledge Discovery and Data Mining (PAKDD). His research interests are in data/knowledge base management systems with emphasis on query processing and optimization, physical database design and database performance. His recent research work includes data quality, data warehousing and data mining. He is also interested in development of Internet-based database applications and electronic business systems. He has been publishing extensively in important international database conferences and journals such as SIGMOD, VLDB, ICDE, EDBT, TKDE, VLDB Journal.
 
1000-1030

Session 5 Stream A: Clustering

Chair : Ah-Hwee Tan

Venue : Orchid-Rose

 
  Criteria on Proximity Graphs for Boundary Extraction and Spatial Clustering*
  Vladimir Estivill-Castro, Ickjai Lee, Alan Murray (Australia)
 
  A Hybrid Approach to Clustering in Very Large Databases*
  Aoying Zhou, Weining Qian, Hailei Qian, Jin Wen, Shuigeng Zhou, Ye Fan (China)
 
 

Session 5 Stream B: Advanced Topics

Chair : Ada Fu

Venue : Orchid-Peony

 
  An Improved Learning Algorithm for Augmented Naive Bayes*
  Huajie Zhang, Charles X. Ling (Canada)
 
  Generalised RBF Networks Trained using IBL Algorithm for Mining Symbolic Data*
  Liviu Vladutu, Stergios Papadimitriou, Severina Mavroudi, Anastassios Bezerianos (Greece)
 
 

Session 5 Stream C: Applications and Tools

Chair : Jeffrey Yu

Venue : Orchid-Magnolia

 
  Seabreeze Prediction Using Bayesian Networks: A Case Study*
  Russell J Kennett, Kevin B Korb, Ann E Nicholson (Australia)
 
  Semi-supervised Learning in Medical Image Database*
  C. H. Li, P. C. Yuen (Hong Kong)
 
1030-1100 Coffee Break
 
1100-1230

Session 6 Stream A: Sequence Mining

Chair : Howard Hamilton

Venue : Orchid-Rose

 
 

Generating Concept Hierarchies/Networks: Mining Additional Semantics in Relational Data

  T. Y. Lin (USA)
 
  Scalable Hierarchical Clustering Method for Sequences of Categorical Values
  Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz (Poland)
 
 

Mining Sequence Patterns from Wind Tunnel Experimental Data for Flight Control

  Zhenyu Liu, Wesley W. Chu, Adam Huang, Chris Folk, Chih-Ming Ho (USA)
 
  Sequential Index Structure for Content-Based Retrieval*
  Maciej Zakrzewicz (Poland)
 
 

Session 6 Stream B: Applications and Tools

Chair : Chun Hung Li

Venue : Orchid-Peony

 
  iJADE eMiner---A Web-based Mining Agent based on Intelligent Java Agent Development Environment (iJADE) on Internet Shopping
  Raymond S. T. Lee, James N. K. Liu (Hong Kong)
 
  Semantic Expectation-based Causation Knowledge Extraction: A Study on Hong Kong Stock Movement Analysis
  Boon-Toh Low, Ki Chan, Lei-Lei Choi, Man-Yee Chin, Sin-Ling Lay (Hong Kong)
 
  Determining Progression in Glaucoma Using Visual Fields
  Andrew Turpin, Eibe Frank, Mark Hall, Ian H. Witten, Chris A. Johnson (New Zealand)
 
  On Application of Rough Data Mining Methods to Automatic Construction of Student Models*
  Feng-Hsu Wang, Shiou-Wen Hung (Taiwan)
 
 

Session 6 Stream C: Industry Track

Chair : Joseph Fong

Venue : Orchid-Magnolia

 
  Using Internet Survey as Mechanisms of Customer Behavior Prediction
  Dr. Dennis Peng , Founder, SuperPoll.net, Taiwan
 
  Improving the web design - mining web data at CITYJOB.com (Case Study)
  Dr H. P. Lo, Associate Professor, Department. of Management Science, City University, Hong Kong
 
  Building a Credit Scorecard With SAS Enterprise Miner
  Dr. K W Cheng, Consultant, SAS Institute Ltd., Hong Kong
 
1230-1400

Lunch

Venue : Grand Ballroom-Taipo & Shek-O

 
1400-1530

Session 7 Stream A: Clustering

Chair : Charles Ling

Venue : Orchid-Rose

 
  Efficient Hierarchical Clustering Algorithms using Partially Overlapping Partitions
  Manoranjan Dash, Huan Liu (Singapore)
 
  A Rough Set-based Clustering Method with Modification of Equivalence Relations*
  Shoji Hirano, Tomohiro Okuzaki, Yutaka Hata, Shusaku Tsumoto, Kouhei Tsumoto (Japan)
 
  Importance of Individual Variables in the k-Means Algorithm*
  Juha Vesanto (Finland)
 
  Learning Bayesian Networks with Hidden Variables Using the Combination of EM and Evolutionary Algorithms*
  Tian Fengzhan, Lu Yuchang, Shi Chunyi (China)
 
 

Session 7 Stream B: Spatial and Temporal Mining

Chair : Joshua Z Huang

Venue : Orchid-Peony

 
  Patterns Discovery Based on Time-Series Decomposition
  Jeffrey Xu Yu, Michael K. Ng, Joshua Zhexue Huang (Hong Kong)
 
  Temporal Data Mining Using Hidden Markov-Local Polynomial Models
  Weiqiang Lin, Mehmet A. Orgun, Graham J. Williams (Australia)
 
  The S^2-Tree: An Index Structure for Subsequence Matching of Spatial Objects
  Haixun Wang, Chang-Shin Perng (USA)
 
  Micro Similarity Queries in Time Series Database*
  Xiao-ming Jin, Yuchang Lu, Chunyi Shi (China)
 
 

Session 7 Stream C: Industry Track

Chair : Dennis Peng

Venue : Orchid-Magnolia

 
  The Usage of Segmentation, Association and Link Analysis in Fraud Detection for Insurance
  Mr. Dick Cheung, Principal Consultant, SAS Institute Ltd., Australia
 
  Uncover Business Intelligence in Any Customer Database
  Ms. Lucy Kwan, Managing Partner, Smartal Solutions Ltd., Hong Kong
 
  Data Mining within the Financial Services Industry (Case Study - Personal Loans)
  Mr. Steven Parker, Head CRM (Customer Sales & Service), Standard Chartered Bank, Hong Kong
 
1530-1600 Coffee Break
 
1600-1715

Session 8 Stream A: Concept Hierarchies

Chair : Siu Ming Yiu

Venue : Orchid-Rose

 
  Concept Approximation in Concept Lattice
  Keyun Hu, Yuefei Sui, Yuchang Lu, Ju Wang, Chunyi Shi (China)
 
 

FFS---An I/O-Efficient Algorithm for Mining Frequent Sequences

 

Minghua Zhang, Ben Kao, Chi-Lap Yip, David Cheung (Hong Kong)

 
  Representing Large Concept Hierarchies using Lattice Data Structure
  Yanee Kachai, Kitsana Waiyamai (Thailand)
 
 

Session 8 Stream B: Interestingness

Chair : Kevin Korb

Venue : Orchid-Peony

 
  Efficient Mining of Niches and Set Routines
  Guozhu Dong, Kaustubh Deshpande (USA)
 
  Evaluation of Interestingness Measures for Ranking Discovered Knowledge
  Robert J. Hilderman, Howard J. Hamilton (Canada)
 
  Peculiarity Oriented Mining and Its Application for Knowledge Discovery in Amino-acid Data
  Ning Zhong, Muneaki Ohshima, Setsuo Ohsuga (Japan)
 
 

Session 8 Stream C: Industry Track

Chair : H P Lo

Venue : Orchid-Magnolia

 
  Enterprise-Level Business Intelligence and Data-Warehousing
  Mr. Tom Lim, IT Evangelist, Manager, Sybase Inc., Hong Kong
 
  Online Marketing Support using Online Analytical Mining Path Traversal Patterns
  Dr. Joseph Fong, Associate Professor, City University of Hong Kong, Director, Universal Data Warehousing Ltd, Hong Kong, and Irene Kwan, H K Wong