ncs {conText} | R Documentation |
Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.
Description
Given a set of embeddings and a set of tokenized contexts, find the top N nearest contexts.
Usage
ncs(x, contexts_dem, contexts = NULL, N = 5, as_list = TRUE)
Arguments
x |
a (quanteda) |
contexts_dem |
a |
contexts |
a (quanteda) |
N |
(numeric) number of nearest contexts to return |
as_list |
(logical) if FALSE all results are combined into a single data.frame If TRUE, a list of data.frames is returned with one data.frame per embedding |
Value
a data.frame
or list of data.frames (one for each target)
with the following columns:
target
(character) rownames of
x
, the labels of the ALC embeddings.NA
ifis.null(rownames(x))
.context
(character) contexts collapsed into single documents (i.e. untokenized). If
contexts
is NULL then this variable will show the context (document) ids which you can use to merge.rank
(character) rank of context in terms of similarity with
x
.value
(numeric) cosine similarity between
x
and context.
Examples
library(quanteda)
# tokenize corpus
toks <- tokens(cr_sample_corpus)
# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*",
window = 6L, rm_keyword = FALSE)
# build document-feature matrix
immig_dfm <- dfm(immig_toks)
# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
# to get group-specific embeddings, average within party
immig_wv_party <- dem_group(immig_dem, groups = immig_dem@docvars$party)
# find nearest contexts by party
# setting as_list = FALSE combines each group's
# results into a single data.frame (useful for joint plotting)
ncs(x = immig_wv_party, contexts_dem = immig_dem,
contexts = immig_toks, N = 5, as_list = TRUE)