prepareSequences {distantia} | R Documentation |
Prepare sequences for a comparison analysis.
Description
This function prepares two or more multivariate time-series that are to be compared. It can work on two different scenarios:
-
Two dataframes: The user provides two separated dataframes, each containing a multivariate time series. These time-series can be regular or irregular, aligned or unaligned, but must have at least a few columns with the same names (pay attention to differences in case between column names representing the same entity) and units. This mode uses exclusively the following arguments:
sequence.A
,sequence.A.name
(optional),sequence.B
,sequence.B.name
(optional), andmerge.model
. -
One long dataframe: The user provides a single dataframe, through the
sequences
argument, with two or more multivariate time-series identified by agrouping.column
.
Usage
prepareSequences(
sequence.A = NULL,
sequence.A.name = "A",
sequence.B = NULL,
sequence.B.name = "B",
merge.mode = "complete",
sequences = NULL,
grouping.column = NULL,
time.column = NULL,
exclude.columns = NULL,
if.empty.cases = "zero",
transformation = "none",
paired.samples = FALSE,
same.time = FALSE
)
Arguments
sequence.A |
dataframe containing a multivariate time-series. |
sequence.A.name |
character string with the name of |
sequence.B |
dataframe containing a multivariate time-series. Must have overlapping columns with |
sequence.B.name |
character string with the name of |
merge.mode |
character string, one of: "overlap", "complete" (default option). If "overlap", |
sequences |
dataframe with multiple sequences identified by a grouping column. |
grouping.column |
character string, name of the column in |
time.column |
character string, name of the column with time/depth/rank data. If |
exclude.columns |
character string or character vector with column names in |
if.empty.cases |
character string with two possible values: "omit", or "zero". If "zero" (default), |
transformation |
character string. Defines what data transformation is to be applied to the sequences. One of: "none" (default), "percentage", "proportion", "hellinger", and "scale" (the latter centers and scales the data using the |
paired.samples |
boolean. If |
same.time |
boolean. If |
Value
A dataframe with the multivariate time series. If squence.A
and sequence.B
are provided, the column identifying the sequences is named "id". If sequences
is provided, the time-series are identified by grouping.column
.
Author(s)
Blas Benito <blasbenito@gmail.com>
Examples
#two sequences as inputs
data(sequenceA)
data(sequenceB)
AB.sequences <- prepareSequences(
sequence.A = sequenceA,
sequence.A.name = "A",
sequence.B = sequenceB,
sequence.B.name = "B",
merge.mode = "complete",
if.empty.cases = "zero",
transformation = "hellinger"
)
#several sequences in a single dataframe
data(sequencesMIS)
MIS.sequences <- prepareSequences(
sequences = sequencesMIS,
grouping.column = "MIS",
if.empty.cases = "zero",
transformation = "hellinger"
)