Monday 16 April 2001 |
|
|
0830-0900 |
Registration
Venue :Grand Ballroom-Fanling |
|
|
0900-1215 |
Morning Parallel Tutorial Program
(Coffee break at 10:30) |
|
|
|
See the Tutorial Page
for details. |
|
|
|
An Introduction to
MARS (Tutorial I) |
|
Dr Dan Steinberg, CEO of Salford Systems,
USA |
|
Venue : Orchid-Camomile |
|
|
|
Static and Dynamic
Data Mining Using Advanced Machine Learning Methods (Tutorial II) |
|
Professor Ryszard S. Michalski, George Mason
University, USA |
|
Venue: Orchid-Magnolia |
|
|
|
Sequential Pattern
mining: From Shopping History Analysis to Weblog Mining and DNA Mining
(Tutorial III) |
|
Professor Jiawei Han and Jian Pei, Simon
Fraser University, Canada |
|
Venue : Orchid-Rose |
|
|
1400-1715 |
Afternoon Parallel Tutorial Program
(Coffee break at 15:30)
|
|
|
|
See the Tutorial Page
for details. |
|
|
|
Recent Advances in
Data Mining Algorithms for Large Databases (Tutorial IV) |
|
Dr Rajeev Rastogi and Dr Kyuseok Shim, USA
and Korea |
|
Venue : Orchid-Camomile |
|
|
|
Web Mining for E-Commerce
(Tutorial V) |
|
Professor Jaideep Srivastava, University
of Minnesota, USA |
|
Venue : Orchaid-Magnolia |
|
|
|
Workshop Program |
|
|
|
See the Workshop Page
for details. |
|
|
0900-1215 |
Workshop on Statistical Techniques in Data
Mining with Applications (Workshop I)
Venue : Orchid-Peony
(Coffee break at 10:30) |
|
|
1400-1715 |
Workshop on Mining Spatial and Temporal Data
(Workshop II)
Venue: Orchid-Peony
(Coffee break at 15:30) |
|
|
1800-2000 |
PAKDD 2001 Reception at Conference Hotel
Venue : Harbour Room I & II |
|
|
Tuesday 17 April 2001 |
|
|
0800-0845 |
Registration
Venue : Grand Ballroom-Fanling |
|
|
0845-0855 |
Conference Opening
Venue : Grand Ballroom-Fanling |
|
|
|
Guest of Honor : Mrs Sarah Kwok, Deputy Commissioner
for Innovation and Technology |
|
Chair : David Cheung |
|
|
0855-0900 |
Welcoming remarks from Conference Chair |
|
Jiawei Han |
|
|
0900-1000 |
Keynote Presentation
Venue : Grand Ballroom-Fanling |
|
|
|
Mining
E-commerce Data: The Good, the Bad, and the Ugly |
|
Ronny Kohavi, Blue Martini Software |
|
Chair : Jiawei Han |
|
Abstract: Electronic
commerce provides all the right ingredients for successful data mining
(the Good). Web logs, however, are at a very low granularity
level, and attempts to mine e-commerce data using only web logs often
result in little interesting insight (the Bad). Getting the
data into minable formats requires significant pre-processing and
data transformations (the Ugly). In the ideal e-commerce architecture,
high level events are logged, transformations are automated, and data
mining results can easily be understood by business people who can
take action quickly and efficiently. Lessons, stories, and challenges
based on mining real data at Blue Martini Software will be presented.
|
|
Biography: Ronny Kohavi
is well known for his work on the Silicon Graphics MineSet project
for data mining and visualization. He joined Silicon Graphics after
getting a Ph.D. in Machine Learning from Stanford University, where
he led the MLC++ project, the Machine Learning library in C++. Kohavi
co-chaired the KDD 99 industrial track and KDD Cup 2000. He co-edited
a special issue of the journal Machine Learning on Applications of
Machine Learning and the special issue of the Data Mining and Knowledge
Discovery journal on Applications of Data Mining to Electronic Commerce.
|
|
|
1000-1030 |
Session 1 Stream A: Text Mining
Chair : Huan Liu
Venue : Orchid-Rose |
|
|
|
Efficient Algorithms for Concept Space
Construction |
|
C. Y. Ng, Joseph Lee, Felix Cheung, Ben Kao,
David Cheung (Hong Kong) |
|
|
|
Session 1 Stream B: Data Mining Tools
Chair : Ning Zhong
Venue: Orchid-Peony |
|
|
|
A Toolbox Approach to Flexible and Efficient
Data Mining |
|
Ole M. Nielson, Peter Christen, Markus Hegland,
Tatiana Semenova, Timothy Hancock (Australia) |
|
|
|
Session 1 Stream C: Advanced Topics
Chair : Rohan Baxter
Venue: Orchid-Magnolia |
|
|
|
Knowledge Acquisition from Both Human
Expert and Data |
|
Takuya Wada, Hiroshi Motoda, Takashi Washio
(Japan) |
|
|
1030-1100 |
Coffee Break |
|
|
1100-1230 |
Session 2 Stream A: Text and Web Mining
Chair : Lizhu Zhou
Venue: Orchid-Rose |
|
|
|
Applying Pattern Mining to Web Information
Extraction |
|
Chia-Hui Chang, Shao-Chen Lui, Yen-Chin Wu
(Taiwan) |
|
|
|
Empirical Study of Recommender Systems
Using Linear Classifiers |
|
Vijay S. Iyengar, Tong Zhang (USA) |
|
|
|
Hierarchical Classification of Documents
with Error Control |
|
Chun-hung Cheng, Jian Tang, Ada Wai-chee
Fu, Irwin King (Hong Kong) |
|
|
|
A Characterized Rating Recommend System* |
|
Yao-Tsung Lin, Shian-Shyong Tseng (Taiwan) |
|
|
|
Session 2 Stream B: Association Rules
Chair : Hiroshi Motoda
Venue: Orchid-Peony |
|
|
|
Mining Optimal Class Association Rule
Set |
|
Jiuyong Li, Hong Shen, Rodney Topor (Australia) |
|
|
|
Generating Frequent Patterns with the
Frequent Pattern List |
|
Fan-Chen Tseng, Ching-Chi Hus (Taiwan) |
|
|
|
User-Defined Association Mining |
|
Ke Wang, Yu He (Canada) |
|
|
|
Direct and Incremental Computing of Maximal
Covering Rules* |
|
Marzena Kryszkiewicz (Poland) |
|
|
|
Session 2 Stream C: Advanced Topics
Chair : Markus Hegland
Venue : Orchid-Magnolia |
|
|
|
An Efficient Data Compression Approach
to the Classification Task |
|
Claudia Diamantini, Maurizio Panti (Italy) |
|
|
|
A Scalable Algorithm for Rule Post-Pruning
of Large Decision Trees |
|
Trong Dung Nguyen, Tu Bao Ho, Hiroshi Shimodaira
(Japan) |
|
|
|
Rule Reduction Over Numerical Attributes
in Decision Trees Using Multilayer Perceptron |
|
DaeEun Kim, Jaeho Lee (United Kingdom) |
|
|
|
Neighborhood Dependencies for Prediction* |
|
Renaud Bassee, Jef Wijsen (Belgium) |
|
|
1230-1400 |
Lunch
Venue : Grand Ballroom-Taipo & Shek-O |
|
|
1400-1500 |
Keynote Presentation
Venue : Grand Ballroom-Fanling |
|
|
|
Incompleteness
in Data Mining |
|
H. V. Jagadish, University of Michigan |
|
Chair : Graham Williams |
|
Abstract: Database technology,
as well as the bulk of data mining technology, is founded upon logic,
with absolute notions of truth and falsehood, at least with respect
to the data set. Patterns are discovered exhaustively, with carefully
engineered algorithms devised to determine all patterns in a data
set that belong to a certain class. For large data sets, many such
data mining techniques are extremely expensive, leading to considerable
research towards solving these problems more cheaply.
We argue that the central goal of data mining is to find SOME
interesting patterns, and not necessarily ALL of them. As such,
techniques that can find most of the answers cheaply are clearly
more valuable than computationally much more expensive techniques
that can guarantee completeness. In fact, it is probably the case
that patterns that can be found cheaply are indeed the most
important ones.
Furthermore, knowledge discovery can be the most effective with
the human analyst heavily involved in the endeavor. To engage a
human analyst, it is important that data mining techniques be
interactive, hopefully delivering (close to) real time responses and
feedback. Clearly then, extreme accuracy and completeness (i.e.,
finding all patterns satisfying some specified criteria)
would almost always be a luxury. Instead, incompleteness (i.e.,
finding only some patterns) and approximation would be
essential.
We exemplify this discussion through the notion of
fascicles. Often many records in a database share similar
values for several attributes. If one is able to identify and group
together records that share similar values for some - even if not
all - attributes, one can both obtain a more parsimonious
representation of the data, and gain useful insight into the data
from a mining perspective. Such groupings are called fascicles. We
explore the relationship of fascicle-finding to association rule
mining, and experimentally demonstrate the benefit of incomplete but
inexpensive algorithms. We also present analytical results
demonstrating both the limits and the benefits of such incomplete
algorithms. |
|
Biography: Professor
Jagadish obtained his Ph.D. from Stanford and spent several years
as head of the database department at AT&T. Prior to Michigan
he was at the University of Illinois. His research spans many aspects
of database systems, particularly in the context of the internet and
XML. |
|
|
1500-1530 |
Session 3 Stream A: Text Mining
Chair : Michael Ng
Venue : Orchid-Rose |
|
|
|
Topic Detection, Tracking and Trend Analysis
Using Self-Organizing Neural Networks* |
|
K. Rajaraman, Ah-Hwee Tan (Singapore) |
|
|
|
Automatic Hypertext Construction Through
a Text Mining Approach by Self-Organizing Maps* |
|
Hsin-Chang Yang, Chung-Hong Lee (Taiwan) |
|
|
|
Session 3 Stream B: Association Rules
Chair : Guozhu Dong
Venue : Orchid-Peony |
|
|
|
Towards Efficient Data Re-Mining (DRM)* |
|
Jiming Liu, Jian Yin (Hong Kong) |
|
|
|
Data Allocation Algorithm for Parallel
Association Rule Discovery* |
|
Anna M. Manning, John A. Keane (United Kingdom) |
|
|
|
Session 3 Stream C: Advanced Topics
Chair : Takao Terano
Venue : Orchid-Magnolia |
|
|
|
A Similarity Indexing Method for the Data
Warehousing---Bit-wise Indexing Method |
|
Wei-Chou Chen, Shian-Shyong Tseng, Lu-Ping
Chang, Mon-Fong Jiang (Taiwan) |
|
|
1530-1600 |
Coffee Break |
|
|
1600-1730 |
Session 4 Stream A: Text and Web Mining
Chair : Aoying Zhou
Venue : Orchid-Rose |
|
Text Categorization Using Weight Adjusted
k-Nearest Neighbor Classification |
|
Eui-Hong Han, George Karypis, Vipin Kumar
(USA) |
|
|
|
Predictive Self-Organizing Networks for
Text Categorization |
|
Ah-Hwee Tan (Singapore) |
|
|
|
Meta-Learning Models for Automatic Textual
Document Categorization |
|
Kwok-Yin Lai, Wai Lam (Hong Kong) |
|
|
|
Discovery of Frequent Tree Structured
Patterns in Semistructured Web Documents* |
|
Tetsuhiro Miyahara, Takayoshi Shoudai, Tomoyuki
Uchida, Kenichi Takahashi, Hiroaki Ueda (Japan) |
|
|
|
Session 4 Stream B: Classification
Chair : Vijay Iyengar
Venue : Orchid-Peony |
|
|
|
Direct Domain Knowledge Inclusion in the
PA3 Rule Induction Algorithm |
|
Pedro de Almeida (Portugal) |
|
|
|
Combining the Strength of Pattern Frequency
and Distance for Classification |
|
Jinyan Li, Kotagiri Ramamohanarao, Guozhu
Dong (Australia) |
|
|
|
Optimizing the Induction of Alternating
Decision Trees |
|
Bernhard Pfahringer, Geoffrey Holmes, Richard
Kirkby (New Zealand) |
|
|
|
Building Behaviour Knowledge Space to
Make Classification Decision* |
|
Xiuzhen Zhang, Guozhu Dong, Kotagiri Ramamohanarao
(Australia) |
|
|
|
Session 4 Stream C: Feature Selection
Chair : Vincent Ng
Venue : Orchid-Magnolia |
|
|
|
Feature Selection for Temporal Health
Records |
|
Rohan A. Baxter, Graham J. Williams, Hongxing
He (Australia) |
|
|
|
Boosting the Performance of Nearest Neighbour
Methods with Feature Selection |
|
Shlomo Geva (Australia) |
|
|
|
Feature Selection for Meta-learning |
|
Alexandros Kalousis, Melanie Hilario (Switzerland) |
|
|
|
Interactive Construction of Decision Trees* |
|
Jianchao Han, Nick Cercone (Canada) |
|
|
1900 |
Conference Banquet
Venue : Orchid Room |
|
|
Wednesday 18 April 2001 |
|
|
0900-1000 |
Keynote Presentation
Venue : Grand Ballroom - Fanling |
|
Seamless
Integration of Data Mining with DBMS and Applications |
|
Hongjun Lu, The Hong Kong University of Science
and Technology |
|
Chair : Qing Li |
|
Abstract: Data mining
has been widely recognized as a powerful tool for exploring added
value from data accumulated in the daily operations of an organization.
A large number of data mining algorithms have been developed during
the past decade. Those algorithms can be roughly divided into two
groups. The fist group of techniques, such as classification, clustering,
prediction and deviation analysis, has been studied for a long time
in machine learning, statistics, and other fields. The second group
of techniques, such as association rule mining, mining in spatial-temporal
databases and mining from the Web, addresses problems related to large
amounts of data. Most classical algorithms in the first group assume
that the data to be mined is somehow available in memory. Although
initial effort in data mining has concentrated on making those algorithms
scalable with respect to large volume of data, most of those scalable
algorithms, even developed by database researchers, are still stand-alone.
It is often assumed that data is available in desired forms, without
considering the fact that most organizations store their data in databases
managed by database management systems (DBMS). As such, most data
mining algorithms can only be loosely coupled with data infrastructures
in organizations and are difficult to infuse into existing mission-critical
applications. Seamlessly integrating data mining techniques with database
applications and database management systems remains an open problem.
In this paper, we propose to tackle the problem of seamless
integration of data mining with DBMS and applications from three
directions. First, with the recent development of database
technology, most database management systems have extended their
functionality in data analysis. Such capability should be fully
explored to develop DBMS-awre data mining algorithms. Ideally, data
mining algorithms can be fully implemented using DBMS supported
functions so that they become database application themselves.
Second, major difficulties in integrating data mining with
applications are algorithm selection and parameter setting. Reducing
or eliminating mining parameters as much as possible and developing
automatic or semi-automatic mining algorithm selection techniques
will greatly increase the application friendliness of data mining
systems. Lastly, standardizing the interface among databases, data
mining algorithms and applications can also facilitate the
integration to certain extent. |
|
Biography: Professor
Lu is a trustee of the VLDB Endowment, a member of the ACM SIGMOD
Advisory Board and serves as a member of the ACM SIGKDD International
Liaisons. He is the chair of the steering committee of the International
Conference on Web-Age Information Management (WAIM), and the co-chair
of the steering committee of Pacific-Asia Conference of Knowledge
Discovery and Data Mining (PAKDD). His research interests are in data/knowledge
base management systems with emphasis on query processing and optimization,
physical database design and database performance. His recent research
work includes data quality, data warehousing and data mining. He is
also interested in development of Internet-based database applications
and electronic business systems. He has been publishing extensively
in important international database conferences and journals such
as SIGMOD, VLDB, ICDE, EDBT, TKDE, VLDB Journal. |
|
|
1000-1030 |
Session 5 Stream A: Clustering
Chair : Ah-Hwee Tan
Venue : Orchid-Rose |
|
|
|
Criteria on Proximity Graphs for Boundary
Extraction and Spatial Clustering* |
|
Vladimir Estivill-Castro, Ickjai Lee, Alan
Murray (Australia) |
|
|
|
A Hybrid Approach to Clustering in Very
Large Databases* |
|
Aoying Zhou, Weining Qian, Hailei Qian, Jin
Wen, Shuigeng Zhou, Ye Fan (China) |
|
|
|
Session 5 Stream B: Advanced Topics
Chair : Ada Fu
Venue : Orchid-Peony |
|
|
|
An Improved Learning Algorithm for Augmented
Naive Bayes* |
|
Huajie Zhang, Charles X. Ling (Canada) |
|
|
|
Generalised RBF Networks Trained using
IBL Algorithm for Mining Symbolic Data* |
|
Liviu Vladutu, Stergios Papadimitriou, Severina
Mavroudi, Anastassios Bezerianos (Greece) |
|
|
|
Session 5 Stream C: Applications and Tools
Chair : Jeffrey Yu
Venue : Orchid-Magnolia |
|
|
|
Seabreeze Prediction Using Bayesian Networks:
A Case Study* |
|
Russell J Kennett, Kevin B Korb, Ann E Nicholson
(Australia) |
|
|
|
Semi-supervised Learning in Medical Image
Database* |
|
C. H. Li, P. C. Yuen (Hong Kong) |
|
|
1030-1100 |
Coffee Break |
|
|
1100-1230 |
Session 6 Stream A: Sequence Mining
Chair : Howard Hamilton
Venue : Orchid-Rose |
|
|
|
Generating Concept Hierarchies/Networks: Mining Additional
Semantics in Relational Data |
|
T. Y. Lin (USA) |
|
|
|
Scalable Hierarchical Clustering Method
for Sequences of Categorical Values |
|
Tadeusz Morzy, Marek Wojciechowski, Maciej
Zakrzewicz (Poland) |
|
|
|
Mining Sequence Patterns from Wind Tunnel Experimental Data
for Flight Control |
|
Zhenyu Liu, Wesley W. Chu, Adam Huang, Chris
Folk, Chih-Ming Ho (USA) |
|
|
|
Sequential Index Structure for Content-Based
Retrieval* |
|
Maciej Zakrzewicz (Poland) |
|
|
|
Session 6 Stream B: Applications and Tools
Chair : Chun Hung Li
Venue : Orchid-Peony |
|
|
|
iJADE eMiner---A Web-based Mining Agent
based on Intelligent Java Agent Development Environment (iJADE) on
Internet Shopping |
|
Raymond S. T. Lee, James N. K. Liu (Hong
Kong) |
|
|
|
Semantic Expectation-based Causation Knowledge
Extraction: A Study on Hong Kong Stock Movement Analysis |
|
Boon-Toh Low, Ki Chan, Lei-Lei Choi, Man-Yee
Chin, Sin-Ling Lay (Hong Kong) |
|
|
|
Determining Progression in Glaucoma Using
Visual Fields |
|
Andrew Turpin, Eibe Frank, Mark Hall, Ian
H. Witten, Chris A. Johnson (New Zealand) |
|
|
|
On Application of Rough Data Mining Methods
to Automatic Construction of Student Models* |
|
Feng-Hsu Wang, Shiou-Wen Hung (Taiwan) |
|
|
|
Session 6 Stream C: Industry Track
Chair : Joseph Fong
Venue : Orchid-Magnolia |
|
|
|
Using Internet Survey as Mechanisms
of Customer Behavior Prediction |
|
Dr. Dennis Peng , Founder, SuperPoll.net,
Taiwan |
|
|
|
Improving the web design - mining web
data at CITYJOB.com (Case Study) |
|
Dr H. P. Lo, Associate Professor, Department.
of Management Science, City University, Hong Kong |
|
|
|
Building a Credit Scorecard With SAS Enterprise
Miner |
|
Dr. K W Cheng, Consultant, SAS Institute
Ltd., Hong Kong |
|
|
1230-1400 |
Lunch
Venue : Grand Ballroom-Taipo & Shek-O |
|
|
1400-1530 |
Session 7 Stream A: Clustering
Chair : Charles Ling
Venue : Orchid-Rose |
|
|
|
Efficient Hierarchical Clustering Algorithms
using Partially Overlapping Partitions |
|
Manoranjan Dash, Huan Liu (Singapore) |
|
|
|
A Rough Set-based Clustering Method with
Modification of Equivalence Relations* |
|
Shoji Hirano, Tomohiro Okuzaki, Yutaka Hata,
Shusaku Tsumoto, Kouhei Tsumoto (Japan) |
|
|
|
Importance of Individual Variables in
the k-Means Algorithm* |
|
Juha Vesanto (Finland) |
|
|
|
Learning Bayesian Networks with Hidden
Variables Using the Combination of EM and Evolutionary Algorithms* |
|
Tian Fengzhan, Lu Yuchang, Shi Chunyi (China) |
|
|
|
Session 7 Stream B: Spatial and Temporal
Mining
Chair : Joshua Z Huang
Venue : Orchid-Peony |
|
|
|
Patterns Discovery Based on Time-Series
Decomposition |
|
Jeffrey Xu Yu, Michael K. Ng, Joshua Zhexue
Huang (Hong Kong) |
|
|
|
Temporal Data Mining Using Hidden Markov-Local
Polynomial Models |
|
Weiqiang Lin, Mehmet A. Orgun, Graham J.
Williams (Australia) |
|
|
|
The S^2-Tree: An Index Structure for Subsequence
Matching of Spatial Objects |
|
Haixun Wang, Chang-Shin Perng (USA) |
|
|
|
Micro Similarity Queries in Time Series
Database* |
|
Xiao-ming Jin, Yuchang Lu, Chunyi Shi (China) |
|
|
|
Session 7 Stream C: Industry Track
Chair : Dennis Peng
Venue : Orchid-Magnolia |
|
|
|
The Usage of Segmentation, Association
and Link Analysis in Fraud Detection for Insurance |
|
Mr. Dick Cheung, Principal Consultant, SAS
Institute Ltd., Australia |
|
|
|
Uncover Business Intelligence
in Any Customer Database |
|
Ms. Lucy Kwan, Managing Partner,
Smartal Solutions Ltd., Hong Kong |
|
|
|
Data
Mining within the Financial Services Industry (Case Study - Personal
Loans) |
|
Mr. Steven Parker, Head CRM (Customer
Sales & Service), Standard Chartered Bank, Hong Kong |
|
|
1530-1600 |
Coffee Break |
|
|
1600-1715 |
Session 8 Stream A: Concept Hierarchies
Chair : Siu Ming Yiu
Venue : Orchid-Rose |
|
|
|
Concept Approximation in Concept Lattice |
|
Keyun Hu, Yuefei Sui, Yuchang Lu, Ju Wang,
Chunyi Shi (China) |
|
|
|
FFS---An I/O-Efficient Algorithm for Mining Frequent
Sequences |
|
Minghua Zhang, Ben Kao, Chi-Lap Yip, David Cheung (Hong
Kong) |
|
|
|
Representing Large Concept Hierarchies
using Lattice Data Structure |
|
Yanee Kachai, Kitsana Waiyamai (Thailand) |
|
|
|
Session 8 Stream B: Interestingness
Chair : Kevin Korb
Venue : Orchid-Peony |
|
|
|
Efficient Mining of Niches and Set Routines |
|
Guozhu Dong, Kaustubh Deshpande (USA) |
|
|
|
Evaluation of Interestingness Measures
for Ranking Discovered Knowledge |
|
Robert J. Hilderman, Howard J. Hamilton (Canada) |
|
|
|
Peculiarity Oriented Mining and Its Application
for Knowledge Discovery in Amino-acid Data |
|
Ning Zhong, Muneaki Ohshima, Setsuo Ohsuga
(Japan) |
|
|
|
Session 8 Stream C: Industry Track
Chair : H P Lo
Venue : Orchid-Magnolia |
|
|
|
Enterprise-Level Business Intelligence
and Data-Warehousing |
|
Mr. Tom Lim, IT Evangelist, Manager, Sybase
Inc., Hong Kong |
|
|
|
Online Marketing Support using Online
Analytical Mining Path Traversal Patterns |
|
Dr. Joseph Fong, Associate Professor, City
University of Hong Kong, Director, Universal Data Warehousing Ltd,
Hong Kong, and Irene Kwan, H K Wong |
|
|
|
|