term_union {RNewsflow} | R Documentation |
Combine terms in a dtm
Description
Given a dtm and a similarity (adjacency) matrix, group clusters of similar terms (simmat > 0) into a single column. Column names will be concatenated, with a "|" seperator (read as OR)
Usage
term_union(dtm, simmat, as_dfm = T, verbose = F, sep = "|", par = NA)
Arguments
dtm |
A quanteda dfm or a CsparseMatrix. |
simmat |
A similarity matrix in CsparseMatrix format. For instance, created with term_char_sim |
as_dfm |
If True, return as quanteda dfm |
verbose |
If True, report progress |
sep |
The separator used for pasting the terms |
par |
If TRUE, add parentheses to colnames before combining. This is mainly for internal use, as it allows specification if OR (term_union) and AND (term_intersect) operations are combined. If NA, this is based on whether parenthese are present. |
Value
A CsparseMatrix or quanteda dfm
Examples
dfm = quanteda::tokens(c('That guy Gadaffi','Do you mean Kadaffi?',
'Nah more like Gadaffel','Not Kadaffel?')) |>
quanteda::dfm()
simmat = term_char_sim(colnames(dfm), same_start=0)
term_union(dfm, simmat, verbose = FALSE)
[Package RNewsflow version 1.2.8 Index]