Influence of unobserved or very noisy driving attributes, highly correlated data collected from a mixture of distributions, and large data sets often distributed among a number of locations are some of the problems associated with knowledge discovery and data mining (KDD) at spatial and temporal databases.
These problems and the proposed solutions will be discussed in the content of geostatistics KDD applications currently studied at our group. A technique for partitioning spatial heterogeneous data into more homogeneous regions by a competition of regression models will be discussed in details. The convergence proof and the experiments showing time efficiency and accuracy will be shown followed by an application aimed at deriving rules for improving treatment in precision agriculture. In addition, spatial-temporal prediction in the absence of some independent variables that have an influence on predicted response will be discussed and a novel estimation method will be presented that exploits auto-regressive modeling of spatial residuals in time. Finally, a performance controlled data reduction algorithm aimed at facilitating an efficient data transfer for a centralized forecasting will be explained. The approach consists of identifying and eliminating redundant data followed by compressing driving attributes based on a sensitivity analysis. Compression up to 13 thousand times with less than 5% accuracy loss will be reported for a large forestry spatial data set.
Results reported at this talk were obtained in collaboration with Tim Fiez, Aleksandar Lazarevic, Dragoljub Pokrajac and Slobodan Vucetic. More details can be found at www.ist.temple.edu
Zoran Obradovic is the Director at the Center for Information Science and Technology and a Professor of Computer and Information Sciences at Temple University. His research interests focus on solving challenging Bioinformatics, Geostatistics and Computational Finance problems by developing and integrating data mining and statistical learning technology for an efficient knowledge discovery at large databases. Funded by NSF, NIH, DOE and industry, during the last decade he contributed to about 120 refereed articles on these and related topics and to several academic and commercial software systems.
Back to Mathematical Challenges in Scientific Data Mining