dtm_compare {corpustools} R Documentation

## Compare two document term matrices

### Description

Compare two document term matrices

### Usage

dtm_compare(
dtm.x,
dtm.y = NULL,
smooth = 0.1,
min_ratio = NULL,
min_chi2 = NULL,
select_rows = NULL,
yates_cor = c("auto", "yes", "no"),
x_is_subset = F,
what = c("freq", "docfreq", "cooccurrence")
)


### Arguments

 dtm.x the main document-term matrix dtm.y the 'reference' document-term matrix smooth Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value. min_ratio threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y min_chi2 threshold for the chi^2 value select_rows Alternative to using dtm.y. Has to be a vector with rownames, by which yates_cor mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used. x_is_subset Specify whether dtm.x is a subset of dtm.y. In this case, the term frequencies of dtm.x will be subtracted from the term frequencies in dtm.y what choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N)

### Value

A data frame with rows corresponding to the terms in dtm and the statistics in the columns

[Package corpustools version 0.4.10 Index]