R: 2D- or 3D-Plot of a list of sentences/documents

plot_doclist {LSAfun}

R Documentation

2D- or 3D-Plot of a list of sentences/documents

Description

2D or 3D-Plot of mutual word similarities to a given list of sentences/documents

Usage

plot_doclist(x,connect.lines="all",method="PCA",dims=3,
   axes=F,box=F,cex=1,chars=10,legend=T, size = c(800,800),
   alpha="graded",alpha.grade=1,col="rainbow",
   tvectors=tvectors,remove.punctuation=TRUE,...)

Arguments

`x`	a character vector of `length(x) > 1` that contains multiple sentences/documents
`dims`	the dimensionality of the plot; set either `dims = 2` or `dims = 3`
`method`	the method to be applied; either a Principal Component Analysis (`method="PCA"`) or a Multidimensional Scaling (`method="MDS"`)
`connect.lines`	(3d plot only) the number of closest associate words each word is connected with via line. Setting `connect.lines="all"` (default) will draw all connecting lines and will automatically apply `alpha="graded"`
`axes`	(3d plot only) whether axes shall be included in the plot
`box`	(3d plot only) whether a box shall be drawn around the plot
`cex`	(2d Plot only) A numerical value giving the amount by which plotting text should be magnified relative to the default.
`chars`	an integer specifying how many letters (starting from the first) of each sentence/document are to be printed in the plot
`legend`	(3d plot only) whether a legend shall be drawn illustrating the color scheme of the `connect.lines`. The legend is inserted as a background bitmap to the plot using `bgplot3d`. Therefore, they do not resize very gracefully (see the `bgplot3d` documentation for more information).
`size`	(3d plot only) A numeric vector with two elements, the first specifying the width and the second specifying the height of the plot device.
`tvectors`	the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
`remove.punctuation`	removes punctuation from `x` and `y`; `TRUE` by default
`alpha`	(3d plot only) A numeric vector specifying the luminance of the `connect.lines`. By setting `alpha="graded"`, the luminance of every line will be adjusted to the cosine between the two words it connects.
`alpha.grade`	(3d plot only) Only relevant if `alpha="graded"`. Specify a numeric value for `alpha.grade` to scale the luminance of all `connect.lines` up (`alpha.grade` > 1) or down (`alpha.grade` < 1) by that factor.
`col`	(3d plot only) A vector specifying the color of the `connect.lines`. With setting `col ="rainbow"` (default), the color of every line will be adjusted to the cosine between the two words it connects, according to the rainbow palette. Other available color palettes for this purpose are `heat.colors`, `terrain.colors`, `topo.colors`, and `cm.colors` (see `rainbow`). Additionally, you can customize any color scale of your choice by providing an input specifying more than one color (for example `col = c("black","blue","red")`).
`...`	additional arguments which will be passed to `plot3d` (in a three-dimensional plot only)

Details

Computes all pairwise similarities within a given list of sentences/documents. On this similarity matrix, a Principal Component Analysis (PCA) or a Multidimensional Sclaing (MDS) is applied to get a two- or three-dimensional solution that best captures the similarity structure. This solution is then plotted.

In the traditional LSA approach, the vector D for a document (or a sentence) consisting of the words (t1, . , tn) is computed as

D = \sum\limits_{i=1}^n t_n

This function then computes the the cosines between two sets of documents (or sentences).

The format of x should be of the kind x <- c("this is the first text","here is another text")

For creating pretty plots showing the similarity structure within this list of words best, set connect.lines="all" and col="rainbow"

Value

see plot3d: this function is called for the side effect of drawing the plot; a vector of object IDs is returned.

plot_doclist further prints a list with two elements:

`coordinates`	the coordinate vectors of the sentences/documents in the plot as a data frame
`xdocs`	A legend for the sentence/document labels in the plot and in the `coordinates`

Author(s)

Fritz Guenther, Taylor Fedechko

References

Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.

Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate Analysis, London: Academic Press.

Examples

data(wonderland)

## Standard Plot

docs <- c("alice was beginning to get very tired.",
          "the red queen greeted alice.",
          "the mad hatter and the mare hare are having a party.",
          "the hatter sliced the cup of tea in half.")
          
plot_doclist(docs,tvectors=wonderland,method="MDS",dims=2)

[Package LSAfun version 0.7.1 Index]