word_cor {chinese.misc} | R Documentation |
Word Correlation in DTM/TDM
Description
Given a DTM/TDM/matrix, the function computes the pearson/spearman/kendall
correlation between pairs of words and filters the values by p value and minimum value of correlation.
It is a little more flexible than tm::findAssocs
.
Usage
word_cor(x, word, type = "dtm", method = "kendall", p = NULL, min = NULL)
Arguments
x |
a DocumentTermMatrix, TermDocumentMatrix object, or a matrix. If it is a matrix,
you must specify its type by the argument |
word |
a character vector of words that you want to know their correlation in you data. If it is not a vector, the function will try to coerce. The length of it should not larger than 200. The function only computes for words that do exist in data, and those not in data will not be included. |
type |
if it starts with "d/D", it represents a DTM; if with "t/T", TDM; others are not valid. This is only used when x is a matrix. The default is "dtm". |
method |
what index is to be computed? It can only be "pearson", "spearman", or "kendall"
(default). The method is passed to |
p |
if the p value of a correlation index is >= this value, the index will be convert to |
min |
if the correlation index is smaller than this value, it will be convert to |
Value
a list. The 1st element is the correlation matrix with diagonal converted to NA
.
The 2nd element is the p value matrix with diagonal converted to NA
.
Examples
set.seed(1)
s <- sample(1:10, 100, replace = TRUE)
m <- matrix(s, nrow = 20)
myword<- c("alpha", "apple", "cake", "data", "r")
colnames(m) <- myword
mycor1 <- word_cor(m, myword)
mycor2 <- word_cor(m, myword, method = "pearson", min = 0.1, p = 0.4)
mt <- t(m)
mycor3 <- word_cor(mt, myword, type = "T", method = "spearman", p = 0.5)