dem_sample {conText} | R Documentation |
Randomly sample documents from a dem
Description
Take a random sample of documents from a dem
with/without replacement and
with the option to group by a variable in dem@docvars
. Note: dem_sample
uses dplyr::sample_frac
underneath the hood, as such size
refers to the fraction of total obs.
Usage
dem_sample(x, size = NULL, replace = FALSE, weight = NULL, by = NULL)
Arguments
x |
a ( |
size |
< |
replace |
Sample with or without replacement? |
weight |
(numeric) Sampling weights. Vector of non-negative numbers of length |
by |
(character or factor vector) either of length 1 with the name of grouping variable for sampling.
Refer to the variable WITH QUOTATIONS e.g. |
Value
a size
x D (dem-class
) document-embedding-matrix corresponding to the sampled
ALC embeddings. Note, @features
in the resulting object will correspond to the original @features
,
that is, they are not subsetted to the sampled documents. For a list of the documents that were
sampled call the attribute: @Dimnames$docs
.
Examples
library(quanteda)
# tokenize corpus
toks <- tokens(cr_sample_corpus)
# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)
# build document-feature matrix
immig_dfm <- dfm(immig_toks)
# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
# to get a random sample
immig_wv_party <- dem_sample(immig_dem, size = 10,
replace = TRUE, by = "party")
# also works
immig_wv_party <- dem_sample(immig_dem, size = 10,
replace = TRUE, by = immig_dem@docvars$party)