dtm_compare {corpustools} | R Documentation |

Compare two document term matrices

```
dtm_compare(
dtm.x,
dtm.y = NULL,
smooth = 0.1,
min_ratio = NULL,
min_chi2 = NULL,
select_rows = NULL,
yates_cor = c("auto", "yes", "no"),
x_is_subset = F,
what = c("freq", "docfreq", "cooccurrence")
)
```

`dtm.x` |
the main document-term matrix |

`dtm.y` |
the 'reference' document-term matrix |

`smooth` |
Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value. |

`min_ratio` |
threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y |

`min_chi2` |
threshold for the chi^2 value |

`select_rows` |
Alternative to using dtm.y. Has to be a vector with rownames, by which |

`yates_cor` |
mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used. |

`x_is_subset` |
Specify whether dtm.x is a subset of dtm.y. In this case, the term frequencies of dtm.x will be subtracted from the term frequencies in dtm.y |

`what` |
choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N) |

A data frame with rows corresponding to the terms in dtm and the statistics in the columns

[Package *corpustools* version 0.4.10 Index]