textmodel_newsmap {newsmap} | R Documentation |
Semi-supervised Bayesian multinomial model for geographical document classification
Description
Train a Newsmap model to predict geographical focus of documents with labels given by a dictionary.
Usage
textmodel_newsmap(
x,
y,
label = c("all", "max"),
smooth = 1,
drop_label = TRUE,
verbose = quanteda_options("verbose"),
entropy = c("none", "global", "local", "average"),
...
)
Arguments
x |
a dfm or fcm created by |
y |
a dfm or a sparse matrix that record class membership of the
documents. It can be created applying |
label |
if "max", uses only labels for the maximum value in each row of
|
smooth |
a value added to the frequency of words to smooth likelihood ratios. |
drop_label |
if |
verbose |
if |
entropy |
[experimental] the scheme to compute the entropy to regularize
likelihood ratios. The entropy of features are computed over labels if
|
... |
additional arguments passed to internal functions. |
Details
Newsmap learns association between words and classes as likelihood
ratios based on the features in x
and the labels in y
. The large
likelihood ratios tend to concentrate to a small number of features but the
entropy of their frequencies over labels or documents helps to disperse the
distribution.
References
Kohei Watanabe. 2018. "Newsmap: semi-supervised approach to geographical news classification." Digital Journalism 6(3): 294-309.
Examples
require(quanteda)
text_en <- c(text1 = "This is an article about Ireland.",
text2 = "The South Korean prime minister was re-elected.")
toks_en <- tokens(text_en)
label_toks_en <- tokens_lookup(toks_en, data_dictionary_newsmap_en, levels = 3)
label_dfm_en <- dfm(label_toks_en)
feat_dfm_en <- dfm(toks_en, tolower = FALSE)
model_en <- textmodel_newsmap(feat_dfm_en, label_dfm_en)
predict(model_en)