cos_sim {conText} | R Documentation |
Compute the cosine similarity between one or more ALC embeddings and a set of features.
Description
Compute the cosine similarity between one or more ALC embeddings and a set of features.
Usage
cos_sim(
x,
pre_trained,
features = NULL,
stem = FALSE,
language = "porter",
as_list = TRUE,
show_language = TRUE
)
Arguments
x |
a (quanteda) |
pre_trained |
(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding. |
features |
(character) features of interest. |
stem |
(logical) - If TRUE, both |
language |
the name of a recognized language, as returned by
|
as_list |
(logical) if FALSE all results are combined into a single data.frame If TRUE, a list of data.frames is returned with one data.frame per feature. |
show_language |
(logical) if TRUE print out message with language used for stemming. |
Value
a data.frame
or list of data.frames (one for each target)
with the following columns:
target
(character) rownames of
x
, the labels of the ALC embeddings. NA if is.null(rownames(x)).feature
(character) feature terms defined in the
features
argument.value
(numeric) cosine similarity between
x
and feature.
Examples
library(quanteda)
# tokenize corpus
toks <- tokens(cr_sample_corpus)
# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)
# build document-feature matrix
immig_dfm <- dfm(immig_toks)
# construct document-embedding-matrix
immig_dem <- dem(immig_dfm, pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform, verbose = FALSE)
# to get group-specific embeddings, average within party
immig_wv_party <- dem_group(immig_dem, groups = immig_dem@docvars$party)
# compute the cosine similarity between each party's embedding and a specific set of features
cos_sim(x = immig_wv_party, pre_trained = cr_glove_subset,
features = c('reform', 'enforcement'), as_list = FALSE)