We present a fast algorithm for detecting and characterizing a cloud of points that are concentrated around a curve in a D-dimensional Euclidean plane. The algorithm characterizes the cloud data by detecting the underlying curve, separating between a “stable” set and a “deviating” set (“outliers”) and estimating the local variances of the stable set around the underlying curve.
We have adapted this algorithm to analyze DNA array data from ChIP-chip experiments as well as expression profiling microarray data. We use the algorithm for both purposes of normalization and for ranking and identifying enriched sites (or differentially expressed genes).
Our methods accommodate the unique characteristics of ChIP-chip data, where the set of immunoprecipitation-enriched segments is large, asymmetric and whose proportion to the whole data varies locally.
We establish some estimates for the performance of our algorithm and exemplify its efficiency with high-dimensional data by applying it to pixel neighborhoods of various images. Here, the ``deviating points'' detected by the algorithm correspond to edges in the original image.
This is a joint work with Joseph McQuown and Bud Mishra. The ChIP-chip analysis is also joint with Alexandre Blais and Brian David Dynlacht.
Back to MGA Workshop II: Multiscale Geometry in Scientific Computing