Tutorial: Introduction to scientific data mining

Chandrika Kamath
Lawrence Livermore National Laboratory

As science and engineering data approaches the petabyte-scale, data mining
techniques are increasing being applied to identify useful information in
this data. In this tutorial, I will first give a brief introduction to
data mining in the context of science and engineering applications. Using
examples from diverse fields such as astronomy, biology, physics, and
remote sensing, I will identify the common threads that permeate the mining
of science data. I will also illustrate the issues that differentiate
scientific data mining from its commercial counterpart. I will then provide
a brief overview of the techniques used for processing science data to
prepare it for data mining. My goal is to show that the diversity of
applications, the richness of the problems faced by practitioners, and the
opportunity to borrow ideas from other more established areas of data
analysis, make scientific data mining an exciting and challenging field.


Chandrika Kamath received the Ph.D. degree in computer science
from the University of Illinois at Urbana-Champaign in 1986. Prior to
joining Lawrence Livermore National Laboratory in 1997, Chandrika was a
Consulting Software Engineer at Digital Equipment Corporation. Her
research interests are in large-scale data mining and pattern recognition,
including image processing, feature extraction, dimension reduction, and
classification and clustering algorithms. She is also interested in the
practical application of these techniques. Chandrika is currently the
project lead and an individual contributor for Sapphire, a project in
large-scale data mining.

Presentation (PDF File)

Back to Mathematical Challenges in Scientific Data Mining