Accepted Tutorials
Mining in distribution sensitive environments: Tackling the problem of class imbalance and predictive uncertainties
Abstract:
Presenter: Nitesh V. Chawla, University of Notre Dame, USA.
(One Session)
Models for knowledge discovery in the real world face the pervasive and compelling problem of irregularities in data distribution. Decisions that are optimal in expected utility can be vulnerable to catastrophic failure, and value functions that reflect the discontinuities of the real world pragmatics can quickly become intractable. Surprises can happen in uncertain environments. The class distributions may not be the same, with the class of interest being rare. The problem of imbalanced data is also often associated with asymmetric costs of misclassifying elements of different classes. Additionally the distribution of the test data may differ from that of the learning sample and the true misclassification costs may be unknown at learning time. The costs of making mistakes or benefits from making correct predictions may also not be constant and can evolve due to operational reasons. Evaluation of classifiers, thus, becomes a profound problem --- how to really evaluate the classifiers when the class distributions are skewed, and the feature and class distributions are non-stationary. Various real-world applications, including but not limited to finance (credit default), direct marketing, scientific simulations, medicine, spam and fraud detection, and network security, exemplify the problem of imbalance in class distribution (the class of interest is relatively rare and is accompanied with a higher cost of error).
Short Biography:
Dr. Nitesh Chawla is an Assistant Professor in the Department of Computer Science and Engineering at the University of Notre Dame. He directs the Data Inference Analysis and Learning Lab (DIAL) and is also the co-director of the Interdisciplinary Center of the Network Science and Applications (iCenSA) at Notre Dame. He has received grants or sponsored research from organizations such as the National Science Foundation, the National Institute of Justice, the Army Research Labs, and Industry Sponsors. He and his research team of graduate and undergraduate students have received numerous awards and honors for their research, including the outstanding dissertation award, research excellence award, best papers at conferences, students' research awards at Notre Dame, and won first place in a classification challenge organized at the Neural Information Processing Society (NIPS) conference. Dr. Chawla has also been noted for his teaching accomplishments, receiving the National Academy of Engineers CASEE New Faculty Fellowship, and the Outstanding Undergraduate Teacher Award in 2008. He serves/has served on organizing and program committees of a number of top-tier conferences. In the organizational capacity, he is currently the Treasurer for the ACM SIGKDD Conference'2010, the Program Chair for the NASA Conference on Intelligent Data Understanding'2010, and the Program Chair for the IEEE Conference on Computational Intelligence and Data Mining'2011. He is on the Editorial Board of IEEE Transactions of Systems, Man and Cybernetics Part B.Opportunities and Challenges in Mining Biomedical Literature
Abstract:
Presenters: Martin Krallinger (Spanish National Cancer Research Center) & Ashish V Tendulkar (IIT, Madras).
(Two Sessions)
There is an increasing interest in the development of biomedical text mining applications not only to enable improved literature search, but also to automatically detect pointers between biologically relevant entities described in articles and their corresponding records in existing annotation databases. The rapid growth of natural language data in biomedical sciences (including scientific articles, patents, patient records, database textual descriptions) together with the practical relevance of these resources for the design, interpretation and evaluation of Bioinformatics and experimental research have resulted in the implementation of a considerable number of new applications. For the development and maintenance of manually annotated database, text mining assisted literature curation has been especially promising, as well as for the construction of gold standard datasets and gene lists in the context of Systems Biology and gene set enrichment. Attempts have been made to integrate text mining with other Bioinformatics data such as sequence, structure and gene expression information. We plan to focus primarily on applications of text mining in biomedical research and issues in building such systems.
Short Biography:
Martin Krallinger is currently working at the Structural Biology and Biocomputing group of the Spanish National Cancer research Center (CNIO). Previous research stays included the National Center of Biotechnology, CNB-CSIC (Spain) and the Centre of Applied Molecular Engineering (CAME, University of Salzburg Austria). He has a strong research record in biomedical text mining, including numerous highly cited publications in the field and has been a part of several international scientific conference committees (e.g. ISMB, BioLINK, ECCB, NETTAB or LBM2007) carrying out referee activities of over 12 prestigious journals in the field (including Bioinformatics, BMC Bioinformatics, Genome Biology or PLoS Computational Biology). He was one of the main organizers of the BioCreative community challenge and developers of the PLAN2L text mining system.
Ashish Tendulkar is currently working at Indian Institute of Technology (IIT) Madras in the department of Computer Science and Engineering. Prior to joining IIT Madras, he worked in the Structural Biology and Biocomputing group of Prof. Alfonso Valencia at the Spanish National Cancer research Center (CNIO). Ashish completed his Ph D from Kanwal Rekhi School of Information Technology (now merged with Department of Computer Science and Engineering), IIT Bombay. His research interests include mining biological databases and text to uncover principles of complex biological systems and structural bioinformatics. He is currently working on a number of text mining projects involving document categorization, extraction of biological entities and relationships between them, PLAN2L text mining system and integration of structural bioinformatics with text mining.Dimensionality Reduction Algorithms
Abstract:
Presenter: Junbin Gao, School of Computing and Mathematics, Charles Sturt University, Australia.
(Two Sessions)
The problem of dimensionality reduction (DR) - extracting low dimensional structure from high dimensional data - arises often in many application areas such as biological data, image data, and dynamical system etc. It is a core problem in the research areas such as machine learning, computer vision, data mining and statistical pattern recognition etc. High dimensional data takes many different forms: from digital image libraries to gene expression microarrays, from neuronal population activities to financial time series. By formulating the problem of dimensionality reduction in a general setting, however, we can analyse many different types of data in the same underlying mathematical framework.
The recent years have witnessed a great advance in this area. A number of methods have been proposed. Different points of views to the same problem generate a considerable variety of forms which not only enriched the DR algorithms family, but also provided a profound understanding of the nature of the DR itself.
The goals of this tutorial are threefold: (1) to revisit most frequently used DR algorithms; (2) to introduce most recently developed DR methods/models and to disseminate proposer's recent work on the unified framework for DR algorithm design; (3) to investigate possibility of DR applications in other areas.
Short Biography:
Professor Junbin Gao graduated from Huazhong University of Science and Technology (http://www.hust.edu.cn), China, in July 1982 with BSc degree in Computational Mathematics and obtained his PhD from Dalian University of Technology (http://www.dut.edu.cn) in July 1991. In July 2005, he joined the School of Information Technology at Charles Sturt University (http://www.csu.edu.au) as an Associate Professor in Computing Science. He was a Senior Lecturer (Jan-July 2005) and Lecturer (Nov 2001-Jan 2005) in Computer Science at University of New England (http://www.une.edu.au). From 1982 to 2001 he was an Associate Lecturer, Lecturer, Associate Professor and Professor in computational mathematics in Department of Mathematics at Huazhong University of Science and Technology.
His main research interests include machine learning, kernel methods, Bayesian learning and inference, data mining, pattern recognition and classification, and image processing. Professor Gao has published more than 100 research papers covering the areas in multivariate spline functions, wavelet analysis in chemometrics, pattern recognition and machine learning etc. He is also the author of two books one of which is a monograph of Wiley's Analytical Chemistry and Its Applications series, titled "Chemometrics: From Basics to Wavelet Transform". For more details see http://csusap.csu.edu.au/~jbgao.