Abstract

Nonnegative Matrix and Tensor Factorizations for Text Mining Applications

Michael Berry

University of Tennessee

Automated approaches for the identification and clustering of semantic features or topics are highly desired for text mining applications. Using a low rank non-negative matrix factorization (NNMF) algorithm to retain natural data non-negativity, we eliminate the need to use subtractive basis vector and encoding calculations present in techniques such as principal component analysis for semantic feature abstraction. Using non-negative tensor factorization (NNTF), temporal and semantic proximity can be exploited to enable tracking of focused discussions as well as latent (unknown) communication patterns.
Demonstrations of NNMF and NNTF algorithms for topic (or discussion) detection and tracking using the Enron Email dataset and Airline Safety Reporting System (ASRS) document collection are presented.

[PPT] [MP3]

No video available

Back to Workshops II: Numerical Tools and Fast Algorithms for Massive Data Mining, Search Engines and Applications