A multi-document summary gives the “gist” of what is contained in a collection of related documents.
But how can we define a “gist?” We explore this question by analyzing human written summaries for
clusters of document sets. In particular, we estimate the probability that word will be chosen by a
human to be included in a summary. We demonstrate that if this probability model were given by an
oracle, then a simple automatic method of summarization can produce extract summaries which are statistically indistinguishable from the human summaries.
Audio (MP3 File, Podcast Ready)