A natural language processing task that generates concise summaries capturing the key information from longer documents. Text summarization creates compact representations for previewing and indexing content in multimodal search systems.
Extractive summarization selects the most important sentences from the original document, while abstractive summarization generates new sentences that paraphrase the key points. Modern systems use transformer-based models that encode the full document and decode a summary. Large language models can produce high-quality abstractive summaries with appropriate prompting.
Extractive methods use sentence scoring based on position, term frequency, and graph-based centrality (TextRank). Abstractive models like BART, PEGASUS, and T5 are encoder-decoder transformers fine-tuned on summarization datasets. LLM-based summarization uses prompting or fine-tuning for controllable summary generation. Evaluation metrics include ROUGE (n-gram overlap), BERTScore (semantic similarity), and human preference ratings.