ncs {conText}R Documentation

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Description

Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.

Usage

ncs(x, contexts_dem, contexts = NULL, N = 5, as_list = TRUE)

Arguments

x

a (quanteda) dem-class or fem-class object.

contexts_dem

a dem-class object corresponding to the ALC embeddings of candidate contexts.

contexts

a (quanteda) tokens-class object of tokenized candidate contexts. Note, these must correspond to the same contexts in contexts_dem. If NULL, then the context (document) ids will be output instead of the text.

N

(numeric) number of nearest contexts to return

as_list

(logical) if FALSE all results are combined into a single data.frame If TRUE, a list of data.frames is returned with one data.frame per embedding

Value

a data.frame or list of data.frames (one for each target) with the following columns:

target

(character) rownames of x, the labels of the ALC embeddings. NA if is.null(rownames(x)).

context

(character) contexts collapsed into single documents (i.e. untokenized). If contexts is NULL then this variable will show the context (document) ids which you can use to merge.

rank

(character) rank of context in terms of similarity with x.

value

(numeric) cosine similarity between x and context.

Examples


library(quanteda)

# tokenize corpus
toks <- tokens(cr_sample_corpus)

# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*",
window = 6L, rm_keyword = FALSE)

# build document-feature matrix
immig_dfm <- dfm(immig_toks)

# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)

# to get group-specific embeddings, average within party
immig_wv_party <- dem_group(immig_dem, groups = immig_dem@docvars$party)

# find nearest contexts by party
# setting as_list = FALSE combines each group's
# results into a single data.frame (useful for joint plotting)
ncs(x = immig_wv_party, contexts_dem = immig_dem,
contexts = immig_toks, N = 5, as_list = TRUE)

[Package conText version 1.4.3 Index]