Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Araly Barrera

Will defend her thesis

Automated Extractive Single Document Summarization: Beating the Baselines

Abstract

The goal of automated summarization is to combat an obvious “Information Overload” problem by communicating the most important contents of a document in a compressed form. Most recent efforts have primarily focused on multi-document summarization due to the difficulties believed to exist when single-document summarization underperforms against baseline metrics. The goal of this study is to reconsider the importance of single-document summarization by introducing a new approach and its implementation, SynSem. The SynSem approach fuses syntactic, semantic, and statistical methodologies and reflects psychological findings that determine specific human selection patterns as humans construct summaries. Successful summary evaluation results and baseline outperformances are demonstrated when SynSem is executed on two separate datasets: the Document Understanding Conference (DUC) 2002 data set and a cognitive experiment article set. These results have implications not only for extractive and abstractive summarization but could also be leveraged in multi-document summarization as well.

Date: Tuesday, April 5, 2011
Time: 1:30 PM
Place: 362-PGH
Faculty, students, and the general public are invited.
Advisor: Prof. Rakesh Verma