Multi-Document Summary Space:What do People Agree is Important?
John Conroy
IDA Center for Computing Sciences
Center for Computing Sciences
A multi-document summary gives the “gist” of what is contained in a collection of related documents.
But how can we define a “gist?” We explore this question by analyzing human written summaries for
clusters of document sets. In particular, we estimate the probability that word will be chosen by a
human to be included in a summary. We demonstrate that if this probability model were given by an
oracle, then a simple automatic method of summarization can produce extract summaries which are statistically indistinguishable from the human summaries.
Audio (MP3 File, Podcast Ready)
