info.distance.lsd {LSDinterface} | R Documentation |
Compute distance measure between LSD Monte Carlo time series and a set of references
Description
This function reads a 3 or 4-dimensional array produced by read.3d.lsd
or read.4d.lsd
and computes several types of distance measures between the time series from a set of Monte Carlo runs and a set of reference time series (like the Monte Carlo average or median).
Usage
info.distance.lsd( array, references, instance = 1,
distance = "euclidean", std.dist = FALSE,
std.val = FALSE, rank = FALSE, weights = 1,
seed = 1, ... )
Arguments
array |
a 3D or 4D array as produced by |
references |
a 2D matrix containing the reference time series, time in rows and variable values in named columns, from which the distance measures are to be computed. Columns must be named for the exact match to the names of the desired variables (contained in |
instance |
integer: the instance of the variable to be read, for variables that exist in more than one object (4D |
distance |
string: the distance measure to be used. The default is to compute the Euclidean distance ( |
std.dist |
a logical value indicating, if |
std.val |
a logical value indicating, if |
rank |
a logical value indicating, if |
weights |
a numerical vector containing the weights to be used for each variable in |
seed |
a single value, interpreted as an integer to define the pseudo-random number generator state used when sampling data, or |
... |
additional parameters required by the specific method (see |
Details
This function is a front-end to the extensive TSdist
package for interfacing it with LSD generated data. Please check the associated documentation for further information.
TSdist
package provides many different distance measure alternatives, including many that allow for different number of time steps among runs and references.
This function may also search the Monte Carlo run which has the overall smallest (standardized) distances from the given references
. Irrespective of the options std.dist
and std.val
, the search uses always standardized values and distances for computation (this does not affect the distance measure matrix values).
One typical application of distance metrics is to select runs which are closer to the Monte Carlo average or median, that is, the runs which are more representative of the Monte Carlo Experiment. As there is no single criteria to define such "closeness", multiple distance measures may help to identify the set of most interesting runs.
Value
Returns a list containing:
dist |
a named matrix containing the distances for each Monte Carlo run (lines) and variables (columns) contained both in |
close |
a named matrix of Monte Carlo run (sample) names, one column per variable, sorted in increasing distance order (closest runs in first line), which can be used to index the 3D or 4D |
rank |
(only if |
Note
When comparing distance measures between different Monte Carlo runs and variables, it is important to standardize the distances and values to ensure consistency. For variables which may present NA
values, setting std.dist = TRUE
ensures distance comparability by dividing the absolute distance of each run-reference pair by the number of effective (non-NA
) time steps. When comparing variables which are dimensionally heterogeneous, std.val = TRUE
uses the relative measure (between 1 and the run value divided by the corresponding reference value) to compute the distances.
When setting std.val = TRUE
, all points in which the references
' values are equal to zero are effectively removed from calculations. This behavior is always applied when searching for the closest Monte Carlo run(s).
Author(s)
Marcelo C. Pereira
See Also
read.3d.lsd()
,
read.4d.lsd()
,
info.stats.lsd()
Examples
# get the list of file names of example LSD results
files <- list.files.lsd( system.file( "extdata", package = "LSDinterface" ) )
# read first instance of all variables from MC files (3D array)
inst1Array <- read.3d.lsd( files )
# create statistics data frames for the variables
inst1Stats <- info.stats.lsd( inst1Array )
# compute the Euclidean distance to the mean for all variables and runs
inst1dist <- info.distance.lsd( inst1Array, inst1Stats$avg )
inst1dist$dist
inst1dist$close
# the same exercise but for a 4D array and Manhattan distance to the median
# plus indicating the Monte Carlo run closest to the median
allArray <- read.4d.lsd( files )
allStats <- info.stats.lsd( allArray, median = TRUE )
allDist <- info.distance.lsd( allArray, allStats$med, distance = "manhattan",
rank = TRUE )
allDist$dist
allDist$close
allDist$rank
names( allDist$rank )[ 1 ] # results file name of the closest run