Inference techniques for network data improve classification performance by exploiting dependencies between attributes of related instances. In particular, a great deal of recent attention has been paid to collective classification procedures, which make simultaneous inferences over attributes of related instances. Collective classification has been shown to be particularly effective for overcoming substantial amounts of missing attribute information. In this talk, I will show that in many tasks, collective classification does not perform well -- apparently due to the low levels of social selection (a.k.a. homophily) or social influence (a.k.a. relational autocorrelation) in the network data. I will argue that in such cases leveraging information about the relational network structure (e.g., betweenness centrality and clustering coefficient) can improve the classification performance. I will also present a survey of various experimental methodologies used in collective classifiers. Our survey reveals that methodologies fall into two main groups, based on distinct formulations of the classification problem: (1) across-network classification and (2) within-network classification. While the methodology for the across-network setting is relatively straightforward, methodologies for within-network classification are more complex and varied. I will explore a number of these variations and present experimental results to illustrate important differences among various methodologies for within-network classification.
This is joint work with Brian Gallagher (LLNL), Lise Getoor (UMD), and her students: Prithviraj Sen, Galileo Namata, and Mustafa Bilgic.
Back to Workshop I: Dynamic Searches and Knowledge Building