alwaysaditi's picture
End of training
dc78b20 verified
there is a big gap between the summaries produced by current automatic summarizers and the abstracts written by human professionals. certainly one factor contributing to this gap is that automatic systems can not always correctly identify the important topics of an article. another factor, however, which has received little attention, is that automatic summarizers have poor text generation techniques. most automatic summarizers rely on extracting key sentences or paragraphs from an article to produce a summary. since the extracted sentences are disconnected in the original article, when they are strung together, the resulting summary can be inconcise, incoherent, and sometimes even misleading. we present a cut and paste based text summarization technique, aimed at reducing the gap between automatically generated summaries and human-written abstracts. rather than focusing on how to identify key sentences, as do other researchers, we study how to generate the text of a summary once key sentences have been extracted. the main idea of cut and paste summarization is to reuse the text in an article to generate the summary. however, instead of simply extracting sentences as current summarizers do, the cut and paste system will "smooth" the extracted sentences by editing them. such edits mainly involve cutting phrases and pasting them together in novel ways. the key features of this work are:there is a big gap between the summaries produced by current automatic summarizers and the abstracts written by human professionals. the key features of this work are: any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the national science foundation. we thank ibm for licensing us the esg parser and the mitre corporation for licensing us the coreference resolution system. finally, we conclude and discuss future work. we will also extend the system to query-based summarization and investigate whether the system can be modified for multiple document summarization. this paper presents a novel architecture for text summarization using cut and paste techniques observed in human-written abstracts. ing operations. related work is discussed in section 6. we identified six operations that can be used alone or together to transform extracted sentences into sentences in human-written abstracts. (mani et al., 1999) addressed the problem of revising summaries to improve their quality. however, the combination operations and combination rules that we derived from corpus analysis are significantly different from those used in the above system, which mostly came from operations in traditional natural language generation. such edits mainly involve cutting phrases and pasting them together in novel ways.