get_seq_cos_sim {conText} | R Documentation |
Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approach
Description
Calculate cosine similarities between target word and candidates words over sequenced variable using ALC embedding approach
Usage
get_seq_cos_sim(
x,
seqvar,
target,
candidates,
pre_trained,
transform_matrix,
window = 6,
valuetype = "fixed",
case_insensitive = TRUE,
hard_cut = FALSE,
verbose = TRUE
)
Arguments
x |
(character) vector - this is the set of documents (corpus) of interest |
seqvar |
ordered variable such as list of dates or ordered iseology scores |
target |
(character) vector - target word |
candidates |
(character) vector of features of interest |
pre_trained |
(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding. |
transform_matrix |
(numeric) a D x D 'a la carte' transformation matrix. D = dimensions of pretrained embeddings. |
window |
(numeric) - defines the size of a context (words around the target). |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
verbose |
(logical) - if TRUE, report the total number of target instances found. |
Value
a data.frame with one column for each candidate term with corresponding cosine similarity values and one column for seqvar.
Examples
library(quanteda)
# gen sequence var (here: year)
docvars(cr_sample_corpus, 'year') <- rep(2011:2014, each = 50)
cos_simsdf <- get_seq_cos_sim(x = cr_sample_corpus,
seqvar = docvars(cr_sample_corpus, 'year'),
target = "equal",
candidates = c("immigration", "immigrants"),
pre_trained = cr_glove_subset,
transform_matrix = cr_transform)