Text Summarization



Automatic summarization aims to extract and present the most important content to the user from an information source. Generally two types of summaries are generated: extract, i.e., a summary which contains text segments copied from the input, and abstract, i.e., a summary consisting of text segments which is not present in the input.

One of summary evaluation issues is that it involves human judgments of different quality criteria like coherence, readability and content. There is no absolute unique correct summary and it is possible that a system output a good summary quite different from a human reference summary (the same problems for machine translation, speech synthesis, etc.).



Traditionally, summarization evaluation compares the tool output summaries with sentences previously extracted by human assessors or judges. The basic idea is that automatic evaluation should collerate to the human assessment.

Two main methods are used for evaluating text summarization. Intrinsic evaluation compares machine generated summaries with human generated summaries, it is considered as system focused evaluation. Extrinsic evaluation measures the performance of summarization in various tasks, and it is also considered as task specific evaluation.

Both methods require significant human resources, using key sentence (sentence fragment) mark-up and human generated summaries for source documents. Summarization evaluation measures provide a ranking score which can be used to compare different summaries of a document.



- Sentence precision/recall based evaluation
- Content similarity measures
- ROUGE (Lin, 2004), cosine similarity, n-gram overlap, LSI (Latent Semantic Indexing), etc.
- Sentence Rank
- Utility measures




- NTCIR (NII Test Collection for IR Systems) includes Text Summarization tasks, e.g. MuST (Multimodal Summarization for Trend Information) at NTCIT-7.


- TAC (Text Analysis Conference): Recognizing Textual Entailment (RTE), Summarization, etc.
- TIPSTER : See the TIPSTER Text Summarization Evaluation: SUMMAC
- TIDES (Translingual Information Detection Extraction and Summarization).

TIDES included several evaluation projects:

    • Information Retrieval: HARD (High Accuracy Retrieval from Documents).
    • Information Detection: TDT (Topic Detection and Tracking).
    • Information Extraction: ACE (Automatic Content Extraction).
    • Summarization: DUC (Document Understanding Conference). DUC has moved to the Text Analysis Conference(TAC).

- CHIL (Computer in the Human Interaction Loop) included a Text Summarization task.
- GERAF (Guide pour l’Evaluation des Résumés Automatiques Français): Guide for the Evaluation of Automatic Summarization in French.










  • Lin C.-Y. (2004) ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26.