| Long, regular and short paper presentations are denoted by [pink], [violet] and [green] respectively. Each long paper presentation is allocated with 30 minutes, with 25 minutes for presentation and 5 minutes for questions; each regular paper presentation is allocated with 20 minutes, with 17 minutes for presentation and 3 minutes for questions; and each short paper presentation is allocated with 15 minutes, with 12 minutes for presentation and 3 minutes for questions. | ||||
| May 21 | ||||
| 8:30 - 9:00 | Opening | |||
| 9:00 - 10:00 | Keynote Speech | |||
| Chair: Einoshin Suzuki | ||||
| Christos Faloutsos; Graph Mining: Laws, Generators and Tools | ||||
| (Room A,B,C) | ||||
| 10:00 - 10:20 | Coffee Break | |||
| 10:20 - 11:50 | Session 1A: Privacy Preserving Data Mining | Session 1B: Web Mining | Session 1C: Clustering 1 | Session 1D: Network Mining |
| (Room A) | (Room B) | (Room C) | (Room D) | |
| Chair: Mei Kobayashi | Chair: Vincent S. Tseng | Chair: Saso Dzeroski | Chair: Yifeng Zeng | |
| Protecting Privacy in Incremental Maintenance for Distributed Association Rule Mining | SEM: Mining Spatial Events from the Web | A Clustering-Oriented Star Coordinate Translation Method for Reliable Clustering Parameterization | Mining Bulletin Board Systems Using Community Generation | |
| Wai Kit Wong, David Wai Lok Cheung, Edward Hung, and Huan Liu | Kaifeng Xu, Rui Li, Shenghua Bao, Dingyi Han, and Yong Yu | Chieh-Yuan Tsai and Chuang-Cheng Chiu | Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou | |
| On Addressing Accuracy Concerns in Privacy Preserving Association Rule Mining | Person Name Disambiguation in Web Pages using Social Network; Compound Words and Latent Topics | Constrained Clustering for Gene Expression Data Mining | Mining Changes in Patent Trends for Competitive Intelligence | |
| Ling Guo, Songtao Guo, and Xintao Wu | Shingo Ono, Issei Sato, Minoru Yoshida, and Hiroshi Nakagawa | Vincent S. Tseng, Lien-Chin Chen, and Ching-Pin Kao | Meng-Jung Shih, Duen-Ren Liu, and Ming-Li Hsu | |
| On Privacy in Time Series Data Mining | Fighting WebSpam: Detecting spam on the Graph via Content and Link Features | G-TREACLE: A New Grid-based and Tree-alike Pattern Clustering Technique for Large Databases | Structure-based hierarchical transformations for interactive visual exploration of social networks | |
| Ye Zhu, Yongjian Fu, and Huirong Fu | Yu-Jiu Yang, Shuang-Hong Yang, and Bao-Gang Hu | Cheng-Fa Tsai and Chia-Chen Yen | Lisa Singh, Mitchell Beard, Brian Gopalan, and Gregory Nelson | |
| Using Ontology-Based User Preferences to Aggregate Rank Lists in Web Search | R-map: Mapping Categorical Data for Clustering and Visualization based On Reference Sets | Analyzing the Propagation of Influence and Concept Evolution in Enterprise Social Networks Through Centrality and Latent Semantic Analysis | ||
| Lin Li, Zhenglu Yang, and Masaru Kitsuregawa | Zhi-Yong Shen, Ming Li, Yi-Dong Shen, and Jun Sun | Weizhong Zhu, Chaomei Chen, and Robert B. Allen | ||
| Efficient Joint Clustering Algorithms in Optimization and Geography Domains | A Framework for Discovering Spatio-Temporal Cohesive Networks | |||
| Chia-Hao Lo and Wen-Chih Peng | Jin Soung Yoo and Joengmin Hwang | |||
| 11:50 - 13:20 | Lunch | |||
| 13:20 - 14:20 | Invited Talk | |||
| Chair: David Cheung | ||||
| Hiroki Arimura; Efficient Algorithms for Mining Frequent and Closed Patterns from Semi-structured Data | ||||
| (Room A,B,C) | ||||
| 14:20 - 14:40 | Coffee Break | |||
| 14:40 - 16:10 | Session 2A: Feature Selection and Construction | Session 2B: Clustering 2 | Session 2C: Frequent Itemset 1 | Session 2D: Sequence Data Mining 1 |
| (Room A) | (Room B) | (Room C) | (Room D) | |
| Chair: Michel Verleysen | Chair: Xintao Wu | Chair: Chi Ming Kao | Chair: Takeaki Uno | |
| Feature Selection by Nonparametric Bayes Error Minimization | Large-scale k-means Clustering with User-Centric Privacy Preservation | LCM over ZBDDs: Fast Generation of Very Large-Scale Frequent Itemsets Using a Compact Graph-Based Representation | A Framework for Modeling Positive Class Expansion with Single Snapshot | |
| Shuang-Hong Yang and Bao-Gang Hu | Jun Sakuma and Shigenobu Kobayashi | Shin-ichi Minato, Takeaki Uno, and Hiroki Arimura | Yang Yu and Zhi-Hua Zhou | |
| Feature construction based on closedness properties is not that simple | Scaling Record Linkage to Non-Uniform Distributed Class Sizes | Efficient Mining of High Utility Itemsets from Large Datasets | Concept Lattice Based Mutation Control for Reactive Motifs Discovery | |
| Dominique Gay, Nazha Selmaoui, and Jean-Francois Boulicaut | Steffen Rendle and Lars Schmidt-Thieme | Alva Erwin, Raj P. Gopalan, and Narasimaha Achuthan | Kitsana Waiyamai, Peera Liewlom, Thanapat Kangkachit, and Thanawin Rakthanmanon | |
| Generation of Globally Relevant Continuous Features for Classification | Clustering Transaction Datasets Using Seeds | FIsViz: A Frequent Itemset Visualizer | A Simple Characterization on Serially Constructible Episodes | |
| Sylvain Letourneau, Stan Matwin, and A. Fazel Famili | Yun Sing Koh and Russel Pears | Carson Kai-Sang Leung, Pourang P. Irani, and Christopher L. Carmichael | Takashi Katoh and Kouichi Hirata | |
| I/O Scalable Bregman Co-clustering | A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data | Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features | ||
| Kuo-Wei Hsu, Arindam Banerjee, and Jaideep Srivastava | Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk | Vincent S. Tseng, Ja-Hwung Su, Jhih-Hong Huang, and Chih-Jen Chen | ||
| 16:10 - 16:30 | Coffee Break | |||
| 16:30 - 18:00 | Session 3A: Frequent Itemset 2 | Session 3B: Subspace Clustering | Session 3C: Decision Tree and | Session 3D: Relational and Network Mining |
| (Room A) | (Room B) | (Room C) Class Imbalance Problem | (Room D) | |
| Chair: Carson K. Leung | Chair: Peer Kröger | Chair: Zhi-Hua Zhou | Chair: Akihiro Yamamoto | |
| Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration | Mining Quality-Aware Subspace Clusters | BOAI: Fast Alternating Decision Tree Induction based on Bottom-up Evaluation | Tracking Topic Evolution in On-line Postings: 2006 IBM Innovation Jam data | |
| Takeaki Uno and Hiroki Arimura | Ying-Ju Chen, Yi-Hong Chu, and Ming-Syan Chen | Bishan Yang, Tengjiao Wang, Dongqing Yang, and Lei Chang | Mei Kobayashi and Raylene Yung | |
| A Decremental Approach for Mining Frequent Itemsets from Uncertain Data | SubClass: Classification of Multidimensional Noisy Data Using Subspace Clusters | A comparison of different off-centered entropies to deal with class imbalance for decision trees | Entity Network Prediction using Multitype Topic Models | |
| Chun-Kit Chui and Ben Kao | Ira Assent, Ralph Krieger, Petra Welter, Jorg Herbers, and Thomas Seidl | Philippe Lenca, Stephane Lallich, Thanh-Nghi Do, and Nguyen-Khang Pham | Hitohiro Shiozaki, Koji Eguchi, and Takenao Ohkawa | |
| A Cluster-Based Genetic-Fuzzy Mining Approach for Items with Multiple Minimum Supports | A Creditable Subspace Labeling Method based on D-S Evidence Theory | Analyzing PETs on Imbalanced Datasets when Training and Testing Class Distributions Differ | Relational pattern mining based on equivalent classes of properties extracted from samples | |
| Chun-Hao Chen, Tzung-Pei Hong, and Vincent S. Tseng | Yu Zong, Xianchao Zhang, He Jiang, and Mingchu Li | David Cieslak and Nitesh Chawla | Nobuhiro Inuzuka, Jun-ichi Motoyama, Shinpei Urazawa, and Tomofumi Nakano | |
| CP-tree: A Tree Structure for Single-Pass Frequent Pattern Mining | A New Credit Scoring Method Based on Rough Sets and Decision Tree | Exploiting Propositionalization based on Random Relational Rules for Semi-Supervised Learning | ||
| Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-Soo Jeong, and Young-Koo Lee | XiYue Zhou, DeFu Zhang, and Yi Jiang | Grant Anderson and Bernhard Pfahringer | ||
| May 22 | ||||
| 9:00 - 10:00 | Invited Talk | |||
| Chair: Graham Williams | ||||
| Michael R. Berthold; Supporting Creativity: Towards Associative Discovery of New Insights | ||||
| (Room A,B,C) | ||||
| 10:00 - 10:20 | Coffee Break | |||
| 10:20 - 11:55 | Session 4A: Outlier Detection | Session 4B: SVM and Regression | Session 4C: Rule Discovery | Session 4D: Feature and Instance Selection |
| (Room A) | (Room B) | (Room C) | (Room D) | |
| Chair: Arthur Zimek | Chair: Masashi Sugiyama | Chair: Bernhard Pfahringer | Chair: Md Rafiul Hassan | |
| Unusual Pattern Detection in High Dimensions | Extreme Support Vector Machine | Minimum Variance Associations --- Discovering Relationships in Numerical Data | Automatic Training Example Selection for Unsupervised Record Linkage | |
| Minh Nguyen, Leo Mark, and Edward Omiecinski | Qiuge Liu, Qing He, and Zhongzhi Shi | Szymon Jaroszewicz | Peter Christen | |
| Unsupervised Change Analysis using Supervised Learning | A Minimal Description Length Scheme for Polynomial Regression | Mining a Complete Set of both Positive and Negative Association Rules from Large Databases | Sparse Kernel-based Feature Weighting | |
| Shohei Hido, Tsuyoshi Ide, Hisashi Kashima, Harunobu Kubo, and Hirofumi Matsuzawa | Aleksandar Pekov, Saso Dzeroski, and Ljuptuo Todorovski | Hao Wang, Xing Zhang, and Guoqing Chen | Shuang-Hong Yang, Yu-Jiu Yang Yang, and Bao-Gang Hu | |
| Improving the Robustness to Outliers of Mixtures of Probabilistic PCAs | Bootstrap based Pattern Selection for Support Vector Regression | Combined Association Rule Mining | A More Topologically Stable Locally Linear Embedding Algorithm Based on R*-Tree | |
| Nicolas Delannay, Cedric Archambeau, and Michel Verleysen | Dongil Kim and Sungzoon Cho | Huaifeng Zhang, Yanchang Zhao, Longbing Cao, and Chengqi Zhang | Tian Xia, Jintao Li, Yongdong Zhang, and Sheng Tang | |
| Cell-based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets | Customer Churn Time Prediction in Mobile Telecommunication Industry using Ordinal Regression | Mining Non-Coincidental Rules Without A User Defined Support Threshold | Locally Linear Online Mapping for Mining Low-Dimensional Data Manifolds | |
| You Wan and Fuling Bian | Rupesh Gopal and Saroj Meher | Yun Sing Koh | Huicheng Zheng, Wei Shen, Qionghai Dai, and Sanqing Hu | |
| Rule Extraction with Rough-Fuzzy Hybridization Method | A Selective Classifier for Incomplete Data | |||
| Nan-Chen Hsieh | Jingnian Chen, Houkuan Huang, Fengzhan Tian, and Shengfeng Tian | |||
| 11:55 - 13:00 | Lunch | |||
| 13:00 - 18:00 | Excursion | |||
| 18:30 - 22:00 | Banquet | |||
| May 23 | ||||
| 9:00 - 10:00 | Invited Talk | |||
| Chair: Huan Liu | ||||
| Genshiro Kitagawa; Prospective Scientific Methodology in Knowledge Society | ||||
| (Room A,B,C) | ||||
| 10:00 - 10:20 | Coffee Break | |||
| 10:20 - 11:50 | Session 5A: Spatial and Image Data Mining | Session 5B: Sequence Data Mining 2 | Session 5C: Semi-Supervised Learning | Session 5D: Application 1 |
| (Room A) | (Room B) | (Room C) | (Room D) | |
| Chair: Jin Soung Yoo | Chair: Shin-ichi Minato | Chair: Nitesh V. Chawla | Chair: Masayuki Numao | |
| Towards Region Discovery in Spatial Datasets | An Efficient Algorithm for Finding Similar Short Substrings from Large Scale String Data | Semi-Supervised Local Fisher Discriminant Analysis for Dimensionality Reduction | Data-Aware Clustering Hierarchy for Wireless Sensor Networks | |
| Wei Ding, Rachsuda Jiamthapthaksin, Rachana Parmar, Dan Jiang, Tomasz Stepinski, and Christoph Eick | Takeaki Uno | Masashi Sugiyama, Tsuyoshi Ide, Shinichi Nakajima, and Jun Sese | Xiaochen Wu, Peng Wang, Wei Wang, and Baile Shi | |
| ANEMI: An Adaptive Neighborhood Expectation-Maximization Algorithm with Spatial Augmented Initialization | Accurate and Efficient Retrieval of Multimedia Time Series Data under Uniform Scaling and Time Warping | Using Supervised and Unsupervised Techniques to Determine Groups of Patients with Different Continuity of Care | Learning User Purchase Intent From User-Centric Data | |
| Tianming Hu, Hui Xiong, Xueqing Gong, and Sam Yuan Sung | Waiyawuth Euachongprasit and Chotirat Ann Ratanamahatana | Eu-Gene Siew, Leonid Churilov, Kate A. Smith-Miles, and Joachim P. Sturmberg | Rajan Lukose, Jiye Li, Jing Zhou, and Satyanarayana Raju Penmetsa | |
| A New Model for Image Annotation | Characteristic-based Descriptors for Motion Sequence Recognition | Forward Semi-Supervised Feature Selection | Exploratory Hot Spot Profile Analysis using an Interactive Visual Drill-Down Self-Organizing Maps | |
| Sanparith Marukatat | Liang Wang, Xiaozhe Wang, Christopher Leckie, and Ramamohanarao Kotagiri | Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu | Denny, Graham Williams and Peter Christen | |
| Jumping Emerging Patterns with Occurrence Count in Image Classification | Active Learning with Misclassification Sampling Using Diverse Ensembles Enhanced by Unlabeled Instances | Discovering New Orders of the Chemical Elements through Genetic Algorithms | ||
| Lukasz Kobylinski and Krzysztof Walczak | Jun Long, Jianping Yin, En Zhu, and Wentao Zhao | Alexandre Blansche and Shuichi Iwata | ||
| Combining Context and Existing Knowledge When Recognizing Biological Entities -- Early results | ||||
| Mika Timonen and Antti Pesonen | ||||
| 11:50 - 13:20 | Lunch | |||
| 13:20 - 14:20 | Invited Talk | |||
| Chair: Kai Ming Ting | ||||
| Robert C. Holte; Cost-sensitive Classifier Evaluation using Cost Curves | ||||
| (Room A,B,C) | ||||
| 14:20 - 14:40 | Coffee Break | |||
| 14:40 - 16:10 | Session 6A: Classification 1 | Session 6B: Graph and Network Mining | Session 6C: Statistical Methods | Session 6D: Text Mining 1 |
| (Room A) | (Room B) | (Room C) | (Room D) | |
| Chair: Ulf Johansson | Chair: Michael Berthold | Chair: Szymon Jaroszewicz | Chair: Manabu Okumura | |
| Multi-Class Named Entity Recognition via Bootstrapping with Dependency Tree-based Patterns | A Mixture Model for Expert Finding | A Decomposition Algorithm for Learning Bayesian Network Structures from Data | Applying Latent Semantic Indexing in Frequent Itemset Mining for Document Relation Discovery | |
| Van Dang and Akiko Aizawa | Jing Zhang, Jie Tang, Liu Liu, and Juanzi Li | Yifeng Zeng and Jorge Cordero Hernandez | Thanaruk Theeramunkong, Kritsada Sriphaew, and Manabu Okumura | |
| An efficient unordered tree kernel and its application to glycan classification | Mining Correlated Subgraphs in Graph Databases | Tradeoff Analysis of Different Markov Blanket Local Learning Approaches | Enriching WordNet with Folksonomies | |
| Tetsuji Kuboyama, Kouichi Hirata, and Kiyoko F. Aoki-Kinoshita | Tomonobu Ozaki and Takenao Ohkawa | Shunkai Fu and Michel C. Desmarais | Hao Zheng, Xian Wu, and Yong Yu | |
| Learning Rules for Multiple Target Classification | Efficient Mining of Minimal Distinguishing Subgraph Patterns from Graph Databases | Query expansion for the language modelling framework using the naive Bayes assumption | Automatic Extraction of Basis Expressions that Indicate Economic Trends | |
| Bernard Zenko and Saso Dzeroski | Zhiping Zeng, Jianyong Wang, and Lizhu Zhou | Laurence Park and Kotagiri Ramamohanarao | Hiroki Sakaji, Hiroyuki Sakai, and Shigeru Masuyama | |
| What is Frequent in a Single Graph | On Discrete Data Modeling | Seeing several stars: a rating inference task for a document containing several evaluation criteria | ||
| Bjoern Bringmann and Siegfried Nijssen | Nizar Bouguila and Walid Elguebaly | Kazutaka Shimada and Tsutomu Endo | ||
| 16:10 - 16:30 | Coffee Break | |||
| 16:30 - 18:00 | Session 7A: Classification 2 | Session 7B: Stream Mining | Session 7C: Application 2 | Session 7D: Text Mining 2 |
| (Room A) | (Room B) | (Room C) | (Room D) | |
| Chair: Robert C. Holte | Chair: Hiroki Arimura | Chair: Takashi Okada | Chair: Dirk E. Van den Poel | |
| Privacy-Preserving Linear Fisher Discriminant Analysis | Handling Numeric Attributes in Hoeffding Trees | Designing a system for a process parameter determined through modified PSO and fuzzy neural network | Term Committee Based Event Identification Within News Topics | |
| Shuguo Han and Wee Keong Ng | Bernhard Pfahringer, Geoff Holmes, and Richard Kirkby | Jui-Tsung Wong, Kuei-Hsien Chen, and Chwen-Tzeng Su | Kuo Zhang, JuanZi Li, Gang Wu, and KeHong Wang | |
| Evaluating Standard Techniques for Implicit Diversity | Maintaining Optimal Multi-way Splits for Numerical Attributes in Data Streams | Forecasting Urban Air Pollution Using HMM-fuzzy Model | A New Framework for Taxonomy Discovery from Text | |
| Ulf Johansson, Tuve Lofstrom, and Lars Niklasson | Tapio Elomaa and Petri Lehtinen | M. Maruf Hossain, Md. Rafiul Hassan, and Michael Kirley | Ahmad El Sayed, Hakim Hacid, and Djamel Zighed | |
| Local Projection in Jumping Emerging Patterns Discovery in Transaction Databases | Connectivity Based Stream Clustering Using Localised Density Exemplars | PAID: Packet Analysis for Anomaly Intrusion Detection | Detecting Near-Duplicates in Large-Scale Short Text Databases | |
| Pawel Terlecki and Krzysztof Walczak | Sebastian Luhr and Mihai Lazarescu | Kuo-Chen Lee, Jason Chang, and Ming-Syan Chen | Caichun Gong, Yulan Huang, Xueqi Cheng, and Shuo Bai | |
| Fast k Most Similar Neighbor Classifier for Mixed Data based on an Approximation and Elimination algorithm | Fast on-line estimation of the joint probability distribution | Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification | Text Categorization of Multilingual Web Pages on Specific Domain | |
| Selene Hernandez Rodriguez, J. Ariel Carrasco-Ochoa, and J. Fco. Martinez-Trinidad | Jan Peter Patist | Tomonari Masada, Senya Kiyasu, and Sueharu Miyahara | Jicheng Liu and Chunyan Liang | |
| The Application of Echo State Network in Stock Data Mining | ||||
| Xiaowei Lin, Zehong Yang, and Yixu Song | ||||