genericSummary {LSAfun} | R Documentation |
Summarize a text
Description
Selects sentences from a text that best describe its topic
Usage
genericSummary(text,k,split=c(".","!","?"),min=5,...)
Arguments
text |
A character vector of |
k |
The number of sentences to be used in the summary |
split |
A character vector specifying which symbols determine the end of a sentence in the document |
min |
The minimum amount of words a sentence must have to be included in the computations |
... |
Further arguments to be passed on to |
Details
Applies the method of Gong & Liu (2001) for generic text summarization of text document D via Latent Semantic Analysis:
Decompose the document D into individual sentences, and use these sentences to form the candidate sentence set S, and set k = 1.
Construct the terms by sentences matrix A for the document D.
Perform the SVD on A to obtain the singular value matrix
\Sigma
, and the right singular vector matrixV^t
. In the singular vector space, each sentence i is represented by the column vector\psi _i = [v_i1, v_i2, ... , v_ir]^t
ofV^t
.Select the k'th right singular vector from matrix
V^t
.Select the sentence which has the largest index value with the k'th right singular vector, and include it in the summary.
If k reaches the predefined number, terminate the op- eration; otherwise, increment k by one, and go to Step 4.
(Cited directly from Gong & Liu, 2001, p. 21)
Value
A character vector of the length k
Author(s)
Fritz Guenther
See Also
textmatrix
,
lsa
,
svd
Examples
D <- "This is just a test document. It is set up just to throw some random
sentences in this example. So do not expect it to make much sense. Probably, even
the summary won't be very meaningful. But this is mainly due to the document not being
meaningful at all. For test purposes, I will also include a sentence in this
example that is not at all related to the rest of the document. Lions are larger than cats."
genericSummary(D,k=1)