PAKDD 2006 Data Mining Competition









The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006) is pleased to host a data mining competition, co-organized by the Singapore Institute of Statistics (SIS) and the Pattern Recognition & Machine Intelligence Association (PREMIA) of Singapore.

Problem Summary

An Asian telco operator which has successfully launched a third generation (3G) mobile telecommunications network would like to make use of existing customer usage and demographic data to identify which customers are likely to switch to using their 3G network.

An original sample dataset of 20,000 2G network customers and 4,000 3G network customers has been provided with more than 200 data fields.  The target categorical variable is “Customer_Type” (2G/3G). A 3G customer is defined as a customer who has a 3G Subscriber Identity Module (SIM) card and is currently using a 3G network compatible mobile phone.

Three-quarters of the dataset (15K 2G, 3K 3G) will have the target field available and is meant to be used for training/testing. The remaining portion (5K 2G, 1K 3G) will be made available with the target field missing and is meant to be used for prediction.

The data mining task is a classification problem for which the objective is to accurately predict as many current 3G customers as possible (i.e. true positives) from the “holdout” sample provided.







