Data Mining: the Interface of Statistics and Computer Science

Padhraic Smyth (UC Irvine) (I)

Title for Talk 1: Data Mining: the Interface of Statistics and
Computer Science
Abstract for Talk 1: What are the contributions
of each of statistics and computer science to data mining? In the
first part of this talk we will explore this question, examining
how ideas from both fields have each influenced data mining as
currently practiced. We will take an "algorithmic" view of data
analysis and discuss new research challenges in this context,
particularly those of relevance to scientific applications.
Selected research challenges to be discussed include scaling of
algorithms to massive data sets, high-dimensional optimization
problems in data mining, massive search for patterns, exploratory
data analysis in the context of prior knowledge, and the
challenges of spatio-temporal data.

Title for Talk 2:
Mixture Models for Data Exploration and Prediction
Abstract: Finite mixture models often provide a relatively simple
yet effective way to model large complex data sets. We begin with
a brief tutorial overview on the representational capabilities of
mixtures and the use of the Expectation-Maximization (EM)
procedure for fitting such models. We will then proceed to discuss
a broad variety of mixture model applications involving large data
sets, including the generalization of the "classic" multivariate
mixture model to temporal and spatial data settings. Applications
from atmospheric science, computational biology, epidemiology,
computer vision, and astronomy will be discussed, time permitting.


Back to Mathematical Challenges in Scientific Data Mining