R: Word Correlation in DTM/TDM

word_cor {chinese.misc}

R Documentation

Word Correlation in DTM/TDM

Description

Given a DTM/TDM/matrix, the function computes the pearson/spearman/kendall correlation between pairs of words and filters the values by p value and minimum value of correlation. It is a little more flexible than tm::findAssocs.

Usage

word_cor(x, word, type = "dtm", method = "kendall", p = NULL, min = NULL)

Arguments

`x`	a DocumentTermMatrix, TermDocumentMatrix object, or a matrix. If it is a matrix, you must specify its type by the argument `type`. If it is a matrix, `NA` is not allowed, and rownames/colnames that are taken as words should not be `NULL`.
`word`	a character vector of words that you want to know their correlation in you data. If it is not a vector, the function will try to coerce. The length of it should not larger than 200. The function only computes for words that do exist in data, and those not in data will not be included.
`type`	if it starts with "d/D", it represents a DTM; if with "t/T", TDM; others are not valid. This is only used when x is a matrix. The default is "dtm".
`method`	what index is to be computed? It can only be "pearson", "spearman", or "kendall" (default). The method is passed to `stats::cor.test`. The default is "kendall".
`p`	if the p value of a correlation index is >= this value, the index will be convert to `NA` in the correlation matrix. The default is `NULL`, which means no filter is done. Note: if both argument p and min are non-Null, their relation is "or" rather than "and".
`min`	if the correlation index is smaller than this value, it will be convert to `NA`. The default is `NULL`, which means no filter is done.

Value

a list. The 1st element is the correlation matrix with diagonal converted to NA. The 2nd element is the p value matrix with diagonal converted to NA.

Examples

set.seed(1)
s <- sample(1:10, 100, replace = TRUE)
m <- matrix(s, nrow = 20)
myword<- c("alpha", "apple", "cake", "data", "r")
colnames(m) <- myword
mycor1 <- word_cor(m, myword)
mycor2 <- word_cor(m, myword, method = "pearson", min = 0.1, p = 0.4)
mt <- t(m)
mycor3 <- word_cor(mt, myword, type = "T", method = "spearman", p = 0.5)

[Package chinese.misc version 0.2.3 Index]