Lecturers for Big Data School
- Social media big data analytics by Professor Jaideep Srivastava.
Bio:
Jaideep Srivastava is Professor of Computer Science & Engineering at the University of Minnesota, where he directs a laboratory focusing on research in Web Mining, Social Media Analytics, and Health Analytics. He has authored over 300 papers, and supervised 30 PhD dissertations and 59 MS theses. He is currently co-leading a multi-institutional, multi-disciplinary project in the rapidly emerging area of social computing (http://vwobservatory.com/). His research has been supported by government agencies, including NSF, NASA, ARDA, DARPA, IARPA, NIH, CDC, US Army, US Air Force, and MNDoT; and industries, including IBM, United Technologies, Eaton, Honeywell, Cargill, and Huawei Telecom. He has an active collaboration with Allina's Center for Healthcare Innovation (http://www.allina.com/ahs/aboutallina.nsf/page/health_care_innovation), where he is a Distinguished Fellow. Dr. Srivastava has significant experience in the industry, in both consulting and executive roles. He has led a data mining team at Amazon.com (www.amazon.com), built a data analytics department at Yodlee (www.yodlee.com), and served as the Chief Technology Officer for Persistent systems (www.persistentsys.com). He has provided technology and strategy advice to Cargill, United Technologies, IBM, Honeywell, KPMG, 3M, TCS, and Eaton, and has served as Advisor to the State Government of Minnesota, the State Government of Maharashtra, and is presently technology adviser to the UID project of the Government of India. He has held distinguished professorships at Heilongjiang University and Wuhan University, China. Dr. Srivastava has BTech from the Indian Institute of Technology (IIT), Kanpur, India, and MS and PhD from University of California, Berkeley. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), and has been an IEEE Distinguished Visitor. He has given over 150 invited talks in over 30 countries, including more than a dozen keynote addresses at major international conferences. Dr. Srivastava is the Co-Founder and CTO of Ninja Metrics (www.ninjametrics.com), which brings his research in social analytics to the commercial world.
- Large scale biomedical and healthcare data mining and applications by Professor Limsoon Wong
Bio:
Limsoon Wong is a provost's chair professor of computer science and a professor of pathology at the National University of Singapore. He currently works mostly on knowledge discovery technologies and their application to biomedicine. Prior to that, he has done significant research in database query language theory and finite model theory, as well as significant development work in broad-scale data integration systems. Limsoon has written about 150 research papers, a few of which are among the best cited of their respective fields. He serves/d on the editorial boards of Information Systems, Journal of Bioinformatics and Computational Biology, Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Drug Discovery Today, and Journal of Biomedical Semantics. He co-founded and is chairman of Molecular Connections, a provider of data curation services employing over 700 curators, analysts, and engineers.
- Fundamental and Advanced Machine Learning Methods and Big Data Applications by Professor Geoff Webb
Bio:
Geoff Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University, where he heads the Centre for Research in Intelligent Systems. Prior to Monash he held appointments at Griffith University and then Deakin University, where he received a personal chair. His primary research areas are machine learning, data mining, and user modelling. He is known for his contribution to the debate about the application of Occam's razor in machine learning and for the development of numerous methods, algorithms and techniques for machine learning, data mining and user modelling. His commercial data mining software, Magnum Opus, incorporates many techniques from his association discovery research. Many of his learning algorithms are included in the widely-used Weka machine learning workbench. He is editor-in-chief of Data Mining and Knowledge Discovery, co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of Statistical Analysis and Data Mining and a member of the editorial boards of Machine Learning and ACM Transactions on Knowledge Discovery from Data. He was co-PC Chair of the 2010 IEEE International Conference on Data Mining and co-General Chair of the 2012 IEEE International Conference on Data Mining.
- Mining Big Data: the state-of-the-art and beyond by Kai-Ming Ting
Abstract:
This lecture surveys the state-of-the-art data mining approaches and their capabilities to deal with big data. The analysis of their capabilitiesfocuses on two of the 3V complexity of big data, i.e., big volume and big velocity, and on four areas of data mining: anomaly detection, clustering, classification and information retrieval.
This lecture provides the key data mining methods and their inherent limitations, and the directions in which current research are tackling these inherent limitations and what research in these directions can expect to achieve.
Future research directions emphasize on the impacts they will bring to big data.
The audience will learn the following:
An overview of the core data mining methods: density-based methods, mass-based methods and support vector machines. The density-based methods include k-nearest neighbour and Bayesian approaches.
The key limitations of these methods in terms of time and space complexities.
Examples of dealing with big volume and big velocity.
Current research foci to reduce the impact of the inherent limitations of the core data mining methods.
Future research directions for mining big data.
Duration:
2 hours
Target audience:
postgraduate students and researchers in machine learning and data mining.
Bio:
Kai Ming Ting is an Associate Professor in the Faculty of Information Technology at Monash University, and currently serves as the Associate Dean Research Training in the Faculty of Information Technology. He had previously held academic positions at Waikato University and Deakin University, and visiting positions at Osaka University, Japan, Nanjing University, China, and Chinese University of Hong Kong. His research projects have been supported by grants from Australian Research Council, US Air Force of Scientific Research (AFOSR/AOARD), Australian Institute of Sport, and Toyota InfoTechnologyCenter (Japan). Awards received include the Runner-up Best Paper Award in 2008 IEEE ICDM, and the Best Paper Award in 2006 PAKDD. He received his PhD from Sydney University.
He is the creator of a new paradigm in data mining called mass estimation. Density estimation is the current paradigm on which most existing data mining algorithms are based. The unique feature of mass estimation is that it has constant time and space complexities, ideal for solving problems with big data.
His research interests are in the areas of mass estimation and mass-based approaches, ensemble approaches and data stream data mining. He is an associate editor for Journal of Data Mining and Knowledge Discovery. He had co-chaired the Pacific-Asia Conference on Knowledge Discovery and Data Mining 2008. He had served as a member of program committees for a number of international conferences including ACM SIGKDD, IEEE ICDM and ICML.
- Large-scale Support Vector Machines: Current Research Trends and Future Directions by Haimonti Dutta
Abstract:
In supervised learning the goal is to learn a function that describes the relationship between the training examples and corresponding labels. This function may then be used to predict labels of unseen data (also called test data). Popular examples include handwritten character recognition and spam classification. Support Vector Machines [1] are often used to solve supervised learning problems. Traditional SVM training algorithms such as chunking, sequential minimal optimization [2] scale super-linearly with the number of training examples and quickly become infeasible for large-scale training data. With the advent of datasets that do not fit in main memory, there arises a need to study large scale algorithms for training support vector machines. This tutorial is aimed at discussing existing algorithms and theoretical foundations of the problem in addition to providing insights into practical considerations when using SVMs with big data.
Duration:
1.5 - 2.0 hours
Outline of the tutorial:
Motivation: Large-scale Machine Learning, why bother?
A brief review of Support Vector Machines
Parallel SVM Algorithms with a discussion on convergence rate, scalability and theoretical properties.
Distributed SVM Algorithms with a discussion on convergence rate, communication cost and other theoretical properties.
Recent trends in optimization for SVMs.
Software for Large-scale SVMs
Future Work: Where is it all heading?
Target Audience:
This seminar is suitable for practitioners, researchers and students working in the field of data mining and machine learning. Working knowledge of linear algebra, probability theory, machine learning and knowledge discovery with emphasis on Support Vector Machines will be assumed.
Bio:
Haimonti Dutta holds a joint appointment at the Center for Computational Learning Systems (CCLS), Columbia University, NY and Indraprastha Institute of Information Technology (IIIT), Delhi India. She is an Associate Research Scientist at CCLS and Assistant Professor in the Computer Science and Engineering Department at IIIT, Delhi. She received her Ph.D. in Computer Science and Electrical Engineering (CSEE) from the University of Maryland, Baltimore County (UMBC) in 2007 her thesis being on discovering patterns and knowledge from large scale distributed systems. Her research interests include machine learning, data mining and pattern recognition; distributed optimization; data intensive computing; distributed and parallel data mining. She has been on the program committees for many conferences including Knowledge Discovery and Data Mining Conferences (KDD), International Conference on Data Mining (ICDM), SIAM Data Mining Conference (SDM), European Conference on Machine Learning (ECML) and has presented/published research papers at many prestigious venues including ICDM, SIAM Data Mining Conference, ICML, HiPC and ICMLA. Her current research is funded by the National Science Foundation, National Endowment of Humanities, Epilepsy Research Foundation and an industrial funding from the Consolidated Edison Company of New York. She is a recipient of the Dr B. C. Roy Scholarship for academic excellence and the UMBC Graduate Dissertation Fellowship, and was nominated for the Best Paper Award at the International Conference on Machine Learning and Applications (ICMLA) in 2008.
References:
[1] Corinna Cortes and Vladimir Vapnik. "Support vector networks." In Machine Learning, pages 273-297, 1995.
[2] Platt, John Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines Technical Report, 1998.