genericSummary {LSAfun}R Documentation

Summarize a text

Description

Selects sentences from a text that best describe its topic

Usage

genericSummary(text,k,split=c(".","!","?"),min=5,...)

Arguments

text

A character vector of length(text) = 1 specifiying the text to be summarized

k

The number of sentences to be used in the summary

split

A character vector specifying which symbols determine the end of a sentence in the document

min

The minimum amount of words a sentence must have to be included in the computations

...

Further arguments to be passed on to textmatrix

Details

Applies the method of Gong & Liu (2001) for generic text summarization of text document D via Latent Semantic Analysis:

  1. Decompose the document D into individual sentences, and use these sentences to form the candidate sentence set S, and set k = 1.

  2. Construct the terms by sentences matrix A for the document D.

  3. Perform the SVD on A to obtain the singular value matrix \Sigma, and the right singular vector matrix V^t. In the singular vector space, each sentence i is represented by the column vector \psi _i = [v_i1, v_i2, ... , v_ir]^t of V^t.

  4. Select the k'th right singular vector from matrix V^t.

  5. Select the sentence which has the largest index value with the k'th right singular vector, and include it in the summary.

  6. If k reaches the predefined number, terminate the op- eration; otherwise, increment k by one, and go to Step 4.

(Cited directly from Gong & Liu, 2001, p. 21)

Value

A character vector of the length k

Author(s)

Fritz Guenther

See Also

textmatrix, lsa, svd

Examples

D <- "This is just a test document. It is set up just to throw some random 
sentences in this example. So do not expect it to make much sense. Probably, even 
the summary won't be very meaningful. But this is mainly due to the document not being
meaningful at all. For test purposes, I will also include a sentence in this 
example that is not at all related to the rest of the document. Lions are larger than cats."

genericSummary(D,k=1)

[Package LSAfun version 0.7.1 Index]