lma_simets {lingmatch} | R Documentation |
Similarity Calculations
Description
Enter a numerical matrix, set of vectors, or set of matrices to calculate similarity per vector.
Usage
lma_simets(a, b = NULL, metric = NULL, group = NULL, lag = 0,
agg = TRUE, agg.mean = TRUE, pairwise = TRUE, symmetrical = FALSE,
mean = FALSE, return.list = FALSE)
Arguments
a |
A vector or matrix. If a vector, |
b |
A vector or matrix to be compared with |
metric |
A character or vector of characters at least partially matching one of the available metric names (or 'all' to explicitly include all metrics), or a number or vector of numbers indicating the metric by index:
|
group |
If |
lag |
Amount to adjust the |
agg |
Logical: if |
agg.mean |
Logical: if |
pairwise |
Logical: if |
symmetrical |
Logical: if |
mean |
Logical: if |
return.list |
Logical: if |
Details
Use setThreadOptions
to change parallelization options; e.g., run
RcppParallel::setThreadOptions(4) before a call to lma_simets to set the number of CPU
threads to 4.
Value
Output varies based on the dimensions of a
and b
:
-
Out: A vector with a value per metric.
In: Only whena
andb
are both vectors. -
Out: A vector with a value per row.
In: Any time a single value is expected per row:a
orb
is a vector,a
andb
are matrices with the same number of rows andpairwise = FALSE
, a group is specified, ormean = TRUE
, and only one metric is requested. -
Out: A data.frame with a column per metric.
In: When multiple metrics are requested in the previous case. -
Out: A sparse matrix with a
metric
attribute with the metric name.
In: Pairwise comparisons within ana
matrix or between ana
andb
matrix, when only 1 metric is requested. -
Out: A list with a sparse matrix per metric.
In: When multiple metrics are requested in the previous case.
Examples
text <- c(
"words of speaker A", "more words from speaker A",
"words from speaker B", "more words from speaker B"
)
(dtm <- lma_dtm(text))
# compare each entry
lma_simets(dtm)
# compare each entry with the mean of all entries
lma_simets(dtm, colMeans(dtm))
# compare by group (corresponding to speakers and turns in this case)
speaker <- c("A", "A", "B", "B")
## by default, consecutive rows from the same group are averaged:
lma_simets(dtm, group = speaker)
## with agg = FALSE, only the rows at the boundary between
## groups (rows 2 and 3 in this case) are used:
lma_simets(dtm, group = speaker, agg = FALSE)