Data Mining Challenges in Astronomy

George Djorgovski (Caltech) (I)

There has been an unprecedented and continuing growth in the
volume, quality, and complexity of astronomical data sets over
the past few years, mainly through large digital sky surveys.
Virtual Observatory (VO) concept represents a framework needed
to cope with this data flood. There are many challenges posed
by the analysis of large and complex data sets expected in the
VO-based research. They are driven both by the size and the
complexity of the data sets (billions of data vectors in
parameter spaces of tens or hundreds of dimensions), by the
heterogeneity of the data and measurement errors, by selection
effects and censored data, and by the intrinsic clustering
properties (functional form, topology) of the data distribution
in the parameter space of observed attributes. Examples of
scientific questions one may wish to address include: objective
determination of the numbers of object classes present in the
data, and the membership probabilities for each source; searches
for unusual, rare, or even new types of objects and phenomena;
discovery of physically interesting (generally multivariate)
correlations which may be present in some of the clusters; etc.

Presentation (PowerPoint File)

Back to Long Programs