Multi-Document Summary Space:What do People Agree is Important?

John Conroy
IDA Center for Computing Sciences
Center for Computing Sciences

A multi-document summary gives the “gist” of what is contained in a collection of related documents.
But how can we define a “gist?” We explore this question by analyzing human written summaries for
clusters of document sets. In particular, we estimate the probability that word will be chosen by a
human to be included in a summary. We demonstrate that if this probability model were given by an
oracle, then a simple automatic method of summarization can produce extract summaries which are statistically indistinguishable from the human summaries.

Audio (MP3 File, Podcast Ready) Presentation (PowerPoint File)

Back to Document Space