dtm_compare {corpustools}R Documentation

Compare two document term matrices

Description

Compare two document term matrices

Usage

dtm_compare(
  dtm.x,
  dtm.y = NULL,
  smooth = 0.1,
  min_ratio = NULL,
  min_chi2 = NULL,
  select_rows = NULL,
  yates_cor = c("auto", "yes", "no"),
  x_is_subset = F,
  what = c("freq", "docfreq", "cooccurrence")
)

Arguments

dtm.x

the main document-term matrix

dtm.y

the 'reference' document-term matrix

smooth

Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value.

min_ratio

threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y

min_chi2

threshold for the chi^2 value

select_rows

Alternative to using dtm.y. Has to be a vector with rownames, by which

yates_cor

mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used.

x_is_subset

Specify whether dtm.x is a subset of dtm.y. In this case, the term frequencies of dtm.x will be subtracted from the term frequencies in dtm.y

what

choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N)

Value

A data frame with rows corresponding to the terms in dtm and the statistics in the columns


[Package corpustools version 0.5.1 Index]