Tutorial: Data mining algorithms

Vipin Kumar
University of Minnesota/AHPCRC

Data mining is a process of efficient supervised or unsupervised discovery of interesting, useful, and previously unknown information from this data. Some of the tasks well suited for data mining include classification, discovery of association rules, and clustering. Classification is a process of supervised learning of a model from training set which is pre-classified into different classes. The discovered model can be used to classify new data. Application domains include retail targeted marketing, and fraud detection. Association rules show the patterns of co-occurrences of items in a transaction database. Unsupervised discovery of association rules has applications in many areas including shelf planning and inventory control. Clustering is a discovery process that groups a set of data into a number of clusters such that the data within a cluster are more similar than the data from different clusters. Applications include market segmentation, targeted marketing, and organization of information on the world-wide web. This tutorial will provide an overview of a variety of algorithms that are commonly used for classification, association rule discovery, and clustering.

Bio:
Vipin Kumar is the Director of Army High Performance Computing Research Center and Professor of Computer Science at the University of Minnesota. His research interests include High Performance computing and data mining. He has authored over 150 research articles, and coedited or coauthored 6 books including the widely used text book ``Introduction to Parallel Computing" and an upcoming book "Data Mining for Scientific and Engineering Applications", edited by R. Grossman, C. Kamath, W. P. Kegelmeyer, V. Kumar, and R. Namburu, Kluwer Academic Publishers, 2001. Kumar was a Conference Co-Chair for First SIAM International Conference on Data Mining, and will be serving as a Program Co-chair for IEEE International Conference on Data Mining to be held in Japan in November 2002. Kumar serves on the editorial boards of Knowledge and Information Systems, Parallel Computing, the Journal of Parallel and Distributed Computing, and served on the editorial boards of IEEE Transactions of Data and Knowledge Engineering (1993-97), IEEE Concurrency (1997-2000), and IEEE Parallel and Distributed Technology (1995-97). He is a Fellow of IEEE, a member of SIAM, and ACM, and a Fellow of the Minnesota Supercomputer Institute.

Presentation (PDF File)

Back to Mathematical Challenges in Scientific Data Mining