Nonnegative Matrix and Tensor Factorizations for Text Mining Applications

Michael Berry
University of Tennessee
Computer Science

Automated approaches for the identification and clustering of semantic features or topics are highly desired for text mining applications. Using a low rank non-negative matrix factorization (NNMF) algorithm to retain natural data non-negativity, we eliminate the need to use subtractive basis vector and encoding calculations present in techniques such as principal component analysis for semantic feature abstraction. Using non-negative tensor factorization (NNTF), temporal and semantic proximity can be exploited to enable tracking of focused discussions as well as latent (unknown) communication patterns.
Demonstrations of NNMF and NNTF algorithms for topic (or discussion) detection and tracking using the Enron Email dataset and Airline Safety Reporting System (ASRS) document collection are presented.

Audio (MP3 File, Podcast Ready) Presentation (PDF File)

Back to Workshops II: Numerical Tools and Fast Algorithms for Massive Data Mining, Search Engines and Applications