workflowImportance {distantia} | R Documentation |
Computes the contribution to dissimilarity of each variable.
Description
This workflow executes the following steps:
computes
psi
as done byworkflowPsi
.computes
psi
as many times as numeric variables insequences
, removing one of them each time (jacknife analysis) to compute the relative contribution of each variable to overall dissimilarity.Delivers an output of type "list" with two slots:
-
psi
a dataframe with the columns "A" and "B" with the respective names of the sequences compared, a column named "All variables" with the psi values of each pair of sequences computed by considering all variables, and then one column per variable, indicating thepsi
value when that variable is removed. -
psi.drop
a dataframe with the columns "A" and "B", and then one column per numeric variable insequences
indicating the percentage of drop inpsi
(as indicated by the "All variables" column in the psi dataframe) when the given variable is removed. Positive values indicate that the given variable reduces dissimilarity when removed, making the sequences more similar, while negative values indicate that the variable increases dissimilarity when removed, making the sequences more different.
-
Usage
workflowImportance(
sequences = NULL,
grouping.column = NULL,
time.column = NULL,
exclude.columns = NULL,
method = "manhattan",
diagonal = FALSE,
paired.samples = FALSE,
same.time = FALSE,
ignore.blocks = FALSE,
parallel.execution = TRUE
)
Arguments
sequences |
dataframe with multiple sequences identified by a grouping column generated by |
grouping.column |
character string, name of the column in |
time.column |
character string, name of the column with time/depth/rank data. |
exclude.columns |
character string or character vector with column names in |
method |
character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error. |
diagonal |
boolean, if |
paired.samples |
boolean, if |
same.time |
boolean. If |
ignore.blocks |
boolean. If |
parallel.execution |
boolean, if |
Details
If we consider the question "what variable contributes the most to the dissimilarity between two sequences?" the answer "the one dropping dissimilarity the most when excluded from the analysis" sounds like a reasonable answer. This workflow attempts to reach that answer by computing psi
while removing one variable at a time.
Value
A list with two slots named psi and psi.drop. The former contains the dissimilarity values when removing each variable, while the latter contains the drop in dissimilarity (as a percentage of psi computed on all variables) that happens when each variable is removed. Positive values indicate that dissimilarity drops when the variable is removed, while negative values indicate that similarity drops when the variable is removed.
Author(s)
Blas Benito <blasbenito@gmail.com>