R: Given a set of embeddings and a set of tokenized contexts,...

ncs {conText}

R Documentation

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Description

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Usage

ncs(x, contexts_dem, contexts = NULL, N = 5, as_list = TRUE)

Arguments

`x`	a (quanteda) `dem-class` or `fem-class` object.
`contexts_dem`	a `dem-class` object corresponding to the ALC embeddings of candidate contexts.
`contexts`	a (quanteda) `tokens-class` object of tokenized candidate contexts. Note, these must correspond to the same contexts in `contexts_dem`. If NULL, then the context (document) ids will be output instead of the text.
`N`	(numeric) number of nearest contexts to return
`as_list`	(logical) if FALSE all results are combined into a single data.frame If TRUE, a list of data.frames is returned with one data.frame per embedding

Value

a data.frame or list of data.frames (one for each target) with the following columns:

target: (character) rownames of x, the labels of the ALC embeddings. NA if is.null(rownames(x)).
context: (character) contexts collapsed into single documents (i.e. untokenized). If contexts is NULL then this variable will show the context (document) ids which you can use to merge.
rank: (character) rank of context in terms of similarity with x.
value: (numeric) cosine similarity between x and context.

Examples


library(quanteda)

# tokenize corpus
toks <- tokens(cr_sample_corpus)

# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*",
window = 6L, rm_keyword = FALSE)

# build document-feature matrix
immig_dfm <- dfm(immig_toks)

# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)

# to get group-specific embeddings, average within party
immig_wv_party <- dem_group(immig_dem, groups = immig_dem@docvars$party)

# find nearest contexts by party
# setting as_list = FALSE combines each group's
# results into a single data.frame (useful for joint plotting)
ncs(x = immig_wv_party, contexts_dem = immig_dem,
contexts = immig_toks, N = 5, as_list = TRUE)

[Package conText version 1.4.3 Index]