The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining
21-24 June, 2010 - Hyderabad, India
PAKDD 2010 Data Mining Competition
Please follow the provided link for the complete information.
Competition website: http://sede.neurotech.com.br/PAKDD2010/
Overview
The 14th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2010) is pleased to host another data mining competition, once again co-organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco (Brazil).
This year's Competition, is again on the well known application of credit scoring. However, this time it focuses on the effects of modeling based on biased data originated from previous decisions made with a high quality decision support system.
All interesting features from last year's competition were preserved, particularly the real-time LeaderBoard for stimulating the competitors' daily participation. The webpage layout has been improved for larger screen sizes. There is now a moderated forum for interaction about the competition. Also, several binary decision metrics were added for submission's performance evaluation.
The webpage layout has been improved for larger screen sizes and there is now a moderated forum for interaction about the competition.
The competition is open for academia and industry. and can be accessed either through the PAKDD 2010 Conference website (www.iiit.ac.in/conferences/pakdd2010/) or directly to the competition server (sede.neurotech.com.br/PAKDD2010/).
Problem Summary
Re-Calibration of a Credit Risk Assessment System Based on Biased Data
The most fundamental and most frequently found type of decision is the Binary Decision. This type of decision appears in any business activity where the decision outcome is either to "do that" or to "do something else". In general, this is made via a simple threshold which serves as the control parameter for producing decisions over a propensity score.
Binary decisions, in principle, could be assessed "successful" or "unsuccessful" for either outcome, via errors type-I and type-II, but in general, only the "do that" decision outcome is monitored for decision assessment.
As a consequence, only a part of the "market" is monitored and has its decisions assessed as a "successful" or "unsuccessful", forming a very biased sample for system re-calibration/re-training because, it has been extracted from the market by a process focused on the decision objective.
This competition focuses on how to build a model for a binary decision support system based on this type of biased sample in a credit scoring application. There are only data about the company's clients for modeling, but not about the rejected applicants. This is the context of PAKDD 2010 Competition.
These data sets come from a private label credit card operation of a Brazilian credit company and its partner shops. The official competition performance metric will be the area under the ROC curve. Some other model performance metrics will be used for comparative purposes.
Important Dates
March 24th
Competition announcement
Competition starts
Modeling and Leaderboard data sets release
LeaderBoard open for submissions
April 16th
Prediction data set release
Prediction submission open (for manuscript and scores)
May 3rd
Competition submission deadline (PDF manuscript and scores)
May 17th
Competition results released
June 21st
Conference starts
Organizers
Paulo J. L. ADEODATO (Chair) Adrian L. ARNAUD (Vice-Chair)