R: Randomly sample documents from a dem

dem_sample {conText}

R Documentation

Randomly sample documents from a dem

Description

Take a random sample of documents from a dem with/without replacement and with the option to group by a variable in dem@docvars. Note: dem_sample uses dplyr::sample_frac underneath the hood, as such size refers to the fraction of total obs.

Usage

dem_sample(x, size = NULL, replace = FALSE, weight = NULL, by = NULL)

Arguments

`x`	a (`dem-class`) document-embedding-matrix
`size`	<`tidy-select`> For `sample_n()`, the number of rows to select. For `sample_frac()`, the fraction of rows to select. If `tbl` is grouped, `size` applies to each group.
`replace`	Sample with or without replacement?
`weight`	(numeric) Sampling weights. Vector of non-negative numbers of length `nrow(x)`. Weights are automatically standardised to sum to 1 (see `dplyr::sample_frac`). May not be applied when `by` is used.
`by`	(character or factor vector) either of length 1 with the name of grouping variable for sampling. Refer to the variable WITH QUOTATIONS e.g. `"party"`. Must be a variable in `dem@docvars`. OR of length nrow(x).

Value

a size x D (dem-class) document-embedding-matrix corresponding to the sampled ALC embeddings. Note, ⁠@features⁠ in the resulting object will correspond to the original ⁠@features⁠, that is, they are not subsetted to the sampled documents. For a list of the documents that were sampled call the attribute: ⁠@Dimnames$docs⁠.

Examples


library(quanteda)

# tokenize corpus
toks <- tokens(cr_sample_corpus)

# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)

# build document-feature matrix
immig_dfm <- dfm(immig_toks)

# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)

# to get a random sample
immig_wv_party <- dem_sample(immig_dem, size = 10,
replace = TRUE, by = "party")

# also works
immig_wv_party <- dem_sample(immig_dem, size = 10,
replace = TRUE, by = immig_dem@docvars$party)

[Package conText version 1.4.3 Index]