The 19th Pacific-Asia Conference on

Knowledge Discovery and Data Mining

Tutorials

Tutorial 1

Crowdsourcing for Big Data Analytics

Presenters: Hisashi Kashima, Satoshi Oyama, Yukino Baba

 

Abstract. Automated data analysis technologies developed in data mining are certainly a core of big data analytics; however, on the other hand, it is also well known that it is not realistic to automatically analyze all of heterogeneous, complex, and unstructured data in the real world, and therefore a significant amount of manual data processing by humans is unavoidable.

 

Crowdsourcing is a relatively new idea to outsource human intelligence tasks to a large number of unspecified people via the internet, and it is attracting considerable attention as a promising solution to dissolve the human bottleneck in the big data analysis.

 

In this tutorial, we will start with introducing the basic concept of crowdsourcing and how it is used for executing data mining processes in the big data analysis. Then we focus on two major usages of crowdsourcing, data collection/annotation and modeling, and technical issues accompanied by them, including quality control problems in crowdsourcing results. Various data-driven approaches to these problems will be introduced. Finally, we will address safety and ethical issues such as security, privacy, and fairness, which are unavoidable in crowdsourced data analytics, and introduce technical efforts for alleviating them

 

Presenters’ biographical sketch.

Hisashi Kashima is a professor at Department of Intelligence Science and Technology, Kyoto University. Before joining the faculty, he was a research staff member of IBM Research Tokyo during April, 1999-July, 2009, and was an associate professor at Department of Mathematical Informatics, The University of Tokyo during August, 2009-March, 2014. His research focuses on foundations of machine learning and data mining, and on their applications in various fields, including marketing, bio-chemo-informatics, industrial/business intelligence, and crowdsourcing. His previous research work includes development of kernel methods for structured data, such as trees and graphs, predictive modeling of networks, including biological and social networks, and anomaly detection for industrial systems, and those contributions were awarded by academic societies. He also contributes to business based on machine learning techniques, and to a number of issued and disclosed patents. He obtained his B.S. degree in applied mathematics and physics in 1997, and an M.S. degree in systems engineering in 1999, and a Ph.D. degree in informatics in 2007 from Kyoto University in Japan.

 

Satoshi Oyama is an associate professor in the Graduate School of Information Science and Technology, Hokkaido University, Japan. He received his B.Eng., M.Eng., and Ph.D. degrees from Kyoto University in 1994, 1996, and 2002, respectively. He was a research fellow of the Japan Society for the Promotion of Science from 2001 to 2002. He was an assistant professor in the Graduate School of Informatics at Kyoto University from 2002 to 2009. He was a visiting assistant professor in the Department of Computer Science at Stanford University from 2003 to 2004. His research interests include machine learning, data mining, information retrieval, crowdsourcing, and human computation.

 

Yukino Baba is a project research associate in Global Research Center for Big Data Mathematics at National Institute of Informatics, and JST ERATO Kawarabayashi Large Graph Project. Her current research focus is on designing AI systems that harness the human intelligence and coordinate the crowd to solve complex problems. Her work involves developing data mining methods in various areas such as crowdsourcing and social media analysis. She received her B.E. degree (2007) from Tokyo University of Science, and a Master degree (2009) and a Ph.D. degree (2012) in Information Science and Technology from The University of Tokyo. She then worked as a postdoctoral researcher at The University of Tokyo (2012-2014).

 

 

 

Tutorial 2

Differential Privacy and Its Applications

Presenters: Gang Li, Tianqing Zhu and Wanlei Zhou

 

Abstract. Differential privacy has become an important research area since the first publication on information revealing in 2006. Since then, extensive work has been done to develop this new concept because it constitutes a rigorous and provable privacy notion that can be implemented in various research areas.

 

Differential privacy acquires the intuition that releasing an aggregated report should not reveal too much information on any individual record in the dataset. This can be achieved using randomized mechanisms whose output distribution remains almost unchanged even with an arbitrary individual record deleted.

 

In this tutorial, we will start with introducing the basic concept of differential privacy and several scenarios on which it can be used for data release and analysis. Based on these scenarios, we then focus on two major research directions, differential privacy data release and differential privacy data analysis. Among them, differential privacy data release has been focused on how to modify the original dataset or the queries with the guarantee of differential privacy while preserving an acceptable dataset utility, while differential privacy data release has concentrated on how to modify the data mining algorithm to satisfy differential privacy while retaining a high mining accuracy. A new rising research area, coupled differential privacy, which assumes records are coupled with each other in datasets, will then be covered. Finally, we will present some popular applications of differential privacy and envisage future research directions.

 

Presenters’ biographical sketch. 

 

Gang Li, senior member of IEEE, received his PhD degree from Deakin University, Australia in 2005. He is currently a senior lecturer in the school of IT at Deakin University. He has co-authored four papers that won best paper prizes, including the PAKDD2014 best student paper, ACM/IEEE ASONAM2012 best paper award, the 2007 Nightingale Prize by Springer journal Medical and Biological Engineering and Computing. He is in the IEEE Computational Intelligence Society "Data Mining and Big Data Analytics" (DMTC) Technical Committee, and an associate editor for Decision Support Systems (Elsevier), and has been the guest editor for the Chinese Journal of Computer, Enterprise Information System, Concurrency and Computing: Practise and Experience, and Future Generation Computer Systems, etc.

 

Tianqing Zhu received her BEng and MEng degrees from Wuhan University, China, in 2000 and 2004, respectively, and a PhD degree from Deakin University in Computer Science, Australia, in 2014. Dr Tianqing Zhu is currently a continuing teaching scholar in the School of Information Technology, Deakin University. Before joining Deakin University, she served as a lecturer in Wuhan Polytechnic University, China from 2004 to 2011. Her research interests include privacy preserving, data mining and network security. She has won the best student paper award in PAKDD 2014.

 

Wanlei Zhou received B.Eng and M.Eng degrees from Harbin Institute of Technology, Harbin, China in 1982 and 1984, respectively, and a PhD degree from The Australian National University, Canberra, Australia, in 1991, all in Computer Science and Engineering. He also received a DSc degree (a higher Doctorate degree) from Deakin University in 2002. He is currently the Alfred Deakin Professor (the highest honour the University can bestow on a member of academic staff), Chair Professor in School of Information Technology, Deakin University. Before joining Deakin University, he served as a lecturer in University of Electronic Science and Technology of China, a system programmer in HP at Massachusetts, USA; a lecturer in Monash University, Melbourne, Australia; and a lecturer in National University of Singapore, Singapore. His research interests include distributed systems, network security, bioinformatics, and elearning. He has published more than 300 papers in refereed international journals and refereed international conferences proceedings, including around 30 articles in IEEE journal in the last 5 years. He has co-authored four papers that won best paper prizes, including the PAKDD2014 best student paper, ACM/IEEE ASONAM2012 best paper award. He has also chaired many international conferences. Prof Zhou is a Senior Member of the IEEE.

 

 

 

Tutorial 3

Behavior Computing: Deep Behavior Analytics and Active Behavior Management

Presenters: Longbing Cao

 

Abstract. Complex behaviors are widely seen in artificial and natural intelligent systems, on the internet, social and online networks, multi-agent systems, and brain systems. The in-depth understanding of complex behaviors has been increasingly recognized as a crucial means for disclosing interior driving forces, causes and impact on businesses in handling many challenging issues. This forms the need and emergence of behavior informatics, i.e. understand behaviors from computing perspective.

 

Traditional behavior modeling mainly relies on qualitative methods from behavioral science and social science perspectives. The so-called behavior analysis often focuses on human demographic and business usage data, in which behavior-oriented elements are hidden in routinely collected transactional data. As a result, it is ineffective or even impossible to deeply scrutinize native behavior intention, lifecycle, dynamics and impact on complex problems and business issues. As shown in Fig. 1, we could develop two directions to explicate a global picture of the behavior informatics: qualitative and quantitative behavior analytics. With the formal representation of coupled behaviors, the qualitative analytics addresses the task of behavior reasoning and verification, while the quantitative research targets behaviour learning and evaluation. Finally, an appropriate way could be chosen to integrate these two studies to obtain an integrated understanding of the implicit complex behaviors from both qualitative and quantitative aspects. During this process, many open issues are worthy of systematic investigation from aspects such as behavior reasoning, behavior learning, behavior evaluation, and behavior integration at the individual but more group levels. In this tutorial, we present an overview of behavior informatics, and discuss complex behavior representation, behavioral feature construction, behavior impact analysis, behavior pattern analysis, negative behavior analysis, behavior interaction and evolution, high-impact behavior analysis, high utility behavior analysis, group and coupled behavior analysis etc. Several realworld case studies are demonstrated, including analyzing exceptional market microstructure behaviors, mining for high impact social security behavior patterns, detecting abnormal pool manipulation behaviors, analyzing student learning progression, and analyzing online banking behavior interactions. We show that behavior informatics creates new opportunities, directions and means for qualitative and quantitative, formal and systematic modeling, learning and analysis of complex behaviors in both physical and virtual

 

Presenters’ biographical sketch.

Longbing Cao is a professor of information technology at the University of Technology Sydney (UTS), Australia. He got one PhD in Intelligent Sciences and another in Computing Science. He is the Founding Director of Advanced Analytics Institute at UTS. He is also the Research Leader of the Data Mining Program at the Australian Capital Markets Cooperative Research Centre. He is a Senior Member of IEEE, SMC Society, Computer Society. His primary research interests include data mining and machine learning, behavior informatics, agent mining, multi-agent systems, and open complex intelligent systems. He initiated and is leading the research on behavior informatics, domain driven data mining, agent mining, and open complex intelligent systems. He chaired around 30 conferences and workshops, such as program co-chairs for IAT11 and local arrangement chair of IAT-WI08 and served as PC member on over 70 international conferences and workshops, including KDD, WI, IAT and AAMAS. He works/worked with several tier-1 organizations on enterprise data mining and behavior analysis, in areas including public sectors, banking, insurance, telecommunication, capital markets, and education. His following expertise and experience has been highly respected in the relevant communities: enterprise applications of intelligent data analysis and intelligent systems and decisions in areas such as fraud detection, outlier detection, risk management, social security, market surveillance, health care, insurance, investment, marketing, and so on in the real world.