word_cor {chinese.misc}R Documentation

Word Correlation in DTM/TDM

Description

Given a DTM/TDM/matrix, the function computes the pearson/spearman/kendall correlation between pairs of words and filters the values by p value and minimum value of correlation. It is a little more flexible than tm::findAssocs.

Usage

word_cor(x, word, type = "dtm", method = "kendall", p = NULL, min = NULL)

Arguments

x

a DocumentTermMatrix, TermDocumentMatrix object, or a matrix. If it is a matrix, you must specify its type by the argument type. If it is a matrix, NA is not allowed, and rownames/colnames that are taken as words should not be NULL.

word

a character vector of words that you want to know their correlation in you data. If it is not a vector, the function will try to coerce. The length of it should not larger than 200. The function only computes for words that do exist in data, and those not in data will not be included.

type

if it starts with "d/D", it represents a DTM; if with "t/T", TDM; others are not valid. This is only used when x is a matrix. The default is "dtm".

method

what index is to be computed? It can only be "pearson", "spearman", or "kendall" (default). The method is passed to stats::cor.test. The default is "kendall".

p

if the p value of a correlation index is >= this value, the index will be convert to NA in the correlation matrix. The default is NULL, which means no filter is done. Note: if both argument p and min are non-Null, their relation is "or" rather than "and".

min

if the correlation index is smaller than this value, it will be convert to NA. The default is NULL, which means no filter is done.

Value

a list. The 1st element is the correlation matrix with diagonal converted to NA. The 2nd element is the p value matrix with diagonal converted to NA.

Examples

set.seed(1)
s <- sample(1:10, 100, replace = TRUE)
m <- matrix(s, nrow = 20)
myword<- c("alpha", "apple", "cake", "data", "r")
colnames(m) <- myword
mycor1 <- word_cor(m, myword)
mycor2 <- word_cor(m, myword, method = "pearson", min = 0.1, p = 0.4)
mt <- t(m)
mycor3 <- word_cor(mt, myword, type = "T", method = "spearman", p = 0.5)

[Package chinese.misc version 0.2.3 Index]