plot_doclist {LSAfun} | R Documentation |
2D- or 3D-Plot of a list of sentences/documents
Description
2D or 3D-Plot of mutual word similarities to a given list of sentences/documents
Usage
plot_doclist(x,connect.lines="all",method="PCA",dims=3,
axes=F,box=F,cex=1,chars=10,legend=T, size = c(800,800),
alpha="graded",alpha.grade=1,col="rainbow",
tvectors=tvectors,remove.punctuation=TRUE,...)
Arguments
x |
a character vector of |
dims |
the dimensionality of the plot; set either |
method |
the method to be applied; either a Principal Component Analysis ( |
connect.lines |
(3d plot only) the number of closest associate words each word is connected with via line. Setting |
axes |
(3d plot only) whether axes shall be included in the plot |
box |
(3d plot only) whether a box shall be drawn around the plot |
cex |
(2d Plot only) A numerical value giving the amount by which plotting text should be magnified relative to the default. |
chars |
an integer specifying how many letters (starting from the first) of each sentence/document are to be printed in the plot |
legend |
(3d plot only) whether a legend shall be drawn illustrating the color scheme of the |
size |
(3d plot only) A numeric vector with two elements, the first specifying the width and the second specifying the height of the plot device. |
tvectors |
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector) |
remove.punctuation |
removes punctuation from |
alpha |
(3d plot only) A numeric vector specifying the luminance of the |
alpha.grade |
(3d plot only) Only relevant if |
col |
(3d plot only) A vector specifying the color of the |
... |
additional arguments which will be passed to |
Details
Computes all pairwise similarities within a given list of sentences/documents. On this similarity matrix, a Principal Component Analysis (PCA) or a Multidimensional Sclaing (MDS) is applied to get a two- or three-dimensional solution that best captures the similarity structure. This solution is then plotted.
In the traditional LSA approach, the vector D for a document (or a sentence) consisting of the words (t1, . , tn) is computed as
D = \sum\limits_{i=1}^n t_n
This function then computes the the cosines between two sets of documents (or sentences).
The format of x
should be of the kind x <- c("this is the first text","here is another text")
For creating pretty plots showing the similarity structure within this list of words best, set connect.lines="all"
and col="rainbow"
Value
see plot3d
: this function is called for the side effect of drawing the plot; a vector of object IDs is returned.
plot_doclist
further prints a list with two elements:
coordinates |
the coordinate vectors of the sentences/documents in the plot as a data frame |
xdocs |
A legend for the sentence/document labels in the plot and in the |
Author(s)
Fritz Guenther, Taylor Fedechko
References
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate Analysis, London: Academic Press.
See Also
cosine
,
multidocs
,
plot_neighbors
,
plot_wordlist
,
plot3d
,
princomp
,
rainbow
Examples
data(wonderland)
## Standard Plot
docs <- c("alice was beginning to get very tired.",
"the red queen greeted alice.",
"the mad hatter and the mare hare are having a party.",
"the hatter sliced the cup of tea in half.")
plot_doclist(docs,tvectors=wonderland,method="MDS",dims=2)