Graph and Autoencoder-Based Unsupervised Feature Selection with Broad and Local Data Structure Preservation

Marco Duarte
University of Massachusetts Amherst
Electrical and Computer Engineering

Feature selection is a dimensionality reduction technique that selects a subset of representative features from high-dimensional data to provide simpler and more comprehensible data models. Recently, feature selection combined with sparse learning has attracted significant attention due to its outstanding performance compared with traditional feature selection methods that ignores correlation between features. These works first map data onto a low-dimensional subspace and then select features by posing a sparsity constraint on the transformation matrix. However, they are restricted by design to linear data transformations, a potential drawback given that the underlying data correlation structures are often non-linear. To leverage a more sophisticated embedding, we propose an autoencoder-based unsupervised feature selection approach that leverages a single-layer autoencoder. More specifically, we enforce column sparsity on the weight matrix connecting the input layer and the hidden layer, as in previous work. Additionally, we include spectral graph analysis on the projected data into the learning process to achieve local data geometry preservation from the original data space to the low-dimensional feature space. Extensive experiments are conducted on image, audio, text, and biological data. The promising experimental results validate the superiority of the proposed method. This is joint work with Siwei Feng.

Back to Science at Extreme Scales: Where Big Data Meets Large-Scale Computing