read.doe.lsd {LSDsensitivity} | R Documentation |
Read a set of experimental data from a LSD model
Description
This function reads the sampling data produced by a LSD model design of experiment (DoE), pre-process it and saves it as a R object that can be used by the other tools provided by the LSDsensitivity package. Optionally, it can be used with a second DoE, on the same simulation model, to allow the out-of-sample (external) validation of the fitted meta-models.
Usage
read.doe.lsd( folder, baseName, outVar = "", does = 1, doeFile = NULL,
respFile = NULL, validFile = NULL, valRespFile = NULL,
confFile = NULL, limFile = NULL, iniDrop = 0, nKeep = -1,
saveVars = NULL, addVars = NULL, eval.vars = NULL,
eval.run = NULL, eval.stat = c( "mean", "median" ),
pool = TRUE, na.rm = FALSE, rm.temp = TRUE, rm.outl = FALSE,
lim.outl = 10, nnodes = 1, quietly = TRUE, instance = 1,
posit = NULL, posit.match = c( "fixed", "glob", "regex" ) )
Arguments
folder |
the relative folder path to the LSD DoE data files, using the R working directory as reference (see |
baseName |
the LSD data files base name, without numbering and extension suffixes (should be the same as the name of the baseline |
outVar |
the name of an existing variable to be used as the reference to perform the sensitivity analysis. If no name is supplied, the default is to use the first element of |
does |
1 or 2: number of experiments to be processed, being 2 only when one additional external validation sample (independent from the main sample) is available (see the required files below). The default is 1. |
doeFile |
the DoE specification file to be used. For the default (NULL), the |
respFile |
the DoE response file to be used/created. For the default (NULL), the |
validFile |
the external validation DoE specification file to be used. For the default (NULL), the |
valRespFile |
the external validation DoE response file to be used/created. For the default (NULL), the |
confFile |
the LSD baseline |
limFile |
the LSD factor limit ranges |
iniDrop |
integer: the number of initial time steps to drop from analysis (from |
nKeep |
integer: the total number of time steps to keep after |
saveVars |
a vector of existing LSD variable names to be kept in the data set. The default ( |
addVars |
a vector of new LSD variable names to be added to the data set. The default ( |
eval.vars |
a function to recalculate any item of the imported data set, including added variables. The default (NULL) is to have no function (just use selected existing variables as is). If defined, function must take two arguments: the data set for a specific DoE point (time steps in the rows) and the list of variables (columns) in the data set. The function may change any value within the data set but should not add or remove rows or columns. |
eval.run |
a function to evaluate the DoE response for each experimental sampling point, attributing an optional value to it. The default (NULL) is to have no function. In this case, the function uses the selected variable Monte Carlo mean and standard deviation, or the median and the median absolute deviation if |
eval.stat |
character: define the statistics to be used to evaluate the DoE response when |
na.rm |
logical: if TRUE NA values are stripped before the computation proceeds. |
rm.temp |
logical: if |
rm.outl |
logical: if |
lim.outl |
numeric: if |
nnodes |
integer: the maximum number of parallel computing nodes (parallel threads) in the current computer to be used for reading the files. The default, |
quietly |
logical: if |
pool |
logical: if |
instance |
integer: the instance of the variable to be read, for variables that exist in more than one object. This number is based on the relative position (column) of the variable in the results file. The default (1) is to read the first instance. Only a single existing instance at a time can be read for analysis. |
posit |
a string, a vector of strings or an integer vector describing the LSD object position of the variable(s) to select. If an integer vector, it should define the position of a SINGLE LSD object. If a string or vector of strings, each element should define one or more different LSD objects, so the returning matrix may contain variables from more than one object. By setting |
posit.match |
a string defining how the |
Details
The function reuses any existing response file(s) (for the main and the optional external validation DoEs) or try to create it (them) if not existing. The response files can be created in relation to any existing, modified or new variable from any simulated time step, including complex combinations of those. New and modified variables (w.r.t. the ones available from LSD) can be easily created by the definition of a eval.vars(data, varList)
function, as shown in the example below. The response values for each sampling point in the DoE(s) can be evaluated using any math/statistical technique over the entire data for each sampled point in every Monte Carlo run by the definition of a eval.run(data, mc.run, var.idx, ci)
, as in the example below.
Each call to the function can process a single variable. If sensitivity analysis is being performed on multiple variables, the function must be called several times. However, if rm.tmp = FALSE
the processing time from the second variable is significantly shortened.
This function requires that the complete set of LSD DoE data files be stored in a single folder/directory. The list of required files is the following (XX
, YY
and ZZ
are sequential control numbers produced by LSD, i = 0, 1,...
):
folder/baseName.lsd : LSD baseline configuration (1 file) folder/baseName.sa : factor ranges (1 file) folder/baseName_XX_YY.csv : DoE specification (1 file) folder/baseName_XX+i.res[.gz]] : DoE data (YY-XX+1 files) folder/baseName_YY+1_ZZ.csv : validation specification (optional - 1 file) folder/baseName_YY+1+i.res[.gz] : validation data (optional - ZZ-YY+1 files)
The function generates the required response files for the selected variable of analysis and produces the following files in the same folder/directory (WWW
is the name of the selected analysis variable):
folder/baseName_XX_YY_WWW.csv : DoE response for the selected variable (1 file) folder/baseName_YY+1_ZZ_WWW.csv : validation response for variable (optional - 1 file)
When posit
is supplied together with col.names
or instance
, the variable selection process is done in two steps. Firstly, the column names set by saveVars
and instance
are selected. Secondly, the instances defined by posit
are selected from the first selection set. See select.colnames.lsd
and select.colattrs.lsd
for examples on how to apply advanced selection options.
Value
The function returns an object/list of class lsd-doe
containing all the experimental data and the corresponding results regarding the selected reference variable outVar
, including the data for the out-of-sample (external) validation of the produced meta-models, if available, as well the DoE(s) details required by the package meta-modelling tools (elementary.effects.lsd
, kriging.model.lsd
, and polynomial.model.lsd
).
List components:
doe |
the DoE data. Can be a tabular data frame if |
resp |
the DoE response data table. |
valid |
the external validation DoE data. Can be a tabular data frame if |
valResp |
the external validation DoE response data table. |
facLim |
the factors limit ranges table. |
facLimLo |
the factors minimum values. |
facLimUp |
the factors maximum values. |
facDef |
the factors default/calibration values. |
saVarName |
the sensitivity analysis reference variable name, as defined by |
Note
See the note in LSDsensitivity-package for step-by-step instructions on how to perform the complete sensitivity analysis process using LSD and R.
Please refer to LSD documentation about the details on the files produced by its sensitivity analysis tools, in particular when using NOLH, Elementary Effects and MC Range Sensitivity Analysis sampling:
LSD documentation is available at https://www.labsimdev.org/ and the latest binaries and source code can be downloaded at https://github.com/marcov64/Lsd/.
Author(s)
NA
See Also
elementary.effects.lsd()
,
kriging.model.lsd()
,
polynomial.model.lsd()
list.files.lsd()
Examples
# get the example directory name
path <- system.file( "extdata/sobol", package = "LSDsensitivity" )
# Steps to use this function:
# 1. define the variables you want to use in the analysis
# 2. optionally, define special handling functions (see examples below)
# 3. load data from a LSD simulation saved results using read.doe.lsd
# 4. perform the elementary effects analysis applying elementary.effects.lsd
# the definition of existing, to take log and to be added variables
lsdVars <- c( "var1", "var2", "var3" )
logVars <- c( "var1", "var3" )
newVars <- c( "var4" )
# load data from a LSD simulation baseline configuration named "Sim1.lsd" to
# perform sensitivity analysis on the variable named "var1"
# there are two groups of sampled data (DoEs) created by LSD being read
# just use no handling functions for now, see possible examples below
dataSet <- read.doe.lsd( path, # data files folder
"Sim3", # data files base name (same as .lsd file)
"var3", # variable name to perform the sens. analysis
does = 2, # # of experiments (data + external validation)
iniDrop = 0, # initial time steps to drop (0=none)
nKeep = -1, # number of time steps to keep (-1=all)
saveVars = lsdVars, # LSD variables to keep in dataset
addVars = newVars, # new variables to add to the LSD dataset
eval.stat = "median", # use median to evaluate runs
rm.temp = FALSE, # reuse temporary speedup files
rm.outl = FALSE, # remove outliers from dataset
lim.outl = 10, # limit non-outliers deviation (# of std. devs.)
quietly = FALSE ) # show information during processing
print( dataSet$doe ) # the design of the experiment sample done in LSD
print( dataSet$valid ) # the extenal validation sample
print( dataSet$saVarName ) # the variable for which the response was analyzed
print( dataSet$resp ) # analysis of the response of the selected variable
#### OPTIONAL HANDLING FUNCTION EXAMPLES ####
# eval.vars( ) EXAMPLE 1
# the definition of a function to take the log of the required variables () and
# compute the new ones (for use on pool = TRUE databases)
eval.vars <- function( dataSet, allVars ) {
tsteps <- nrow( dataSet ) # number of time steps in simulated data set
nvars <- ncol( dataSet ) # number of variables in data set (including new ones)
# ---- Recompute values for existing variables ----
for( var in allVars ) {
if( var %in% logVars ) { # take the log values of selected variables
try( dataSet[ , var ] <- log( dataSet[ , var ] ), silent = TRUE ) # <= 0 as NaN
}
}
# ---- Calculate values of new variables (added to LSD data set) ----
dataSet[ , "var4" ] <- dataSet[ , "var1" ] + dataSet[ , "var2" ] # example of new var
return( dataSet )
}
# load data again, now using new variable v4 for analysis
dataSet <- read.doe.lsd( path, # data files folder
"Sim3", # data files base name (same as .lsd file)
"var4", # variable name to perform the sens. analysis
does = 2, # # of experiments (data + external validation)
iniDrop = 0, # initial time steps to drop (0=none)
nKeep = -1, # number of time steps to keep (-1=all)
saveVars = lsdVars, # LSD variables to keep in dataset
addVars = newVars, # new variables to add to the LSD dataset
eval.vars = eval.vars,# function to evaluate/adjust/expand the dataset
rm.temp = TRUE, # remove temporary speedup files
rm.outl = FALSE, # remove outliers from dataset
lim.outl = 10 ) # limit non-outliers deviation (# of std. devs.)
print( dataSet$doe ) # the design of the experiment sample done in LSD
print( dataSet$valid ) # the external validation sample
print( dataSet$saVarName ) # the variable for which the response was analyzed
print( dataSet$resp ) # analysis of the response of the selected variable
# eval.vars( ) EXAMPLE 2
# the definition of a function to compute the new variables
# (for use on pool = FALSE databases)
# ---- 4D data frame version (when pool = FALSE) ----
eval.vars <- function( data, vars ) {
tsteps <- length( data [ , 1, 1, 1 ] )
nvars <- length( data [ 1, , 1, 1 ] )
insts <- length( data [ 1, 1, , 1 ] )
samples <- length( data [ 1, 1, 1, ] )
# ---- Compute values for new variables, preventing infinite values ----
for( m in 1 : samples ) # for all MC samples (files)
for( j in 1 : insts ) # all instances
for( i in 1 : tsteps ) # all time steps
for( var in vars ) { # and all variables
if( var == "var4" ) {
# Normalization of key variables using the period average size
mean <- mean( data[ i, "var2", , m ], na.rm = TRUE )
if( is.finite ( mean ) && mean != 0 )
data[ i, var, j, m ] <- data[ i,"var2", j, m ] / mean
else
data[ i, var, j, m ] <- NA
}
}
return( data )
}
# load data again, now using new variable var2 for analysis
dataSet <- read.doe.lsd( path, # data files folder
"Sim3", # data files base name (same as .lsd file)
"var2", # variable name to perform the sens. analysis
does = 2, # # of experiments (data + external validation)
iniDrop = 0, # initial time steps to drop (0=none)
nKeep = -1, # number of time steps to keep (-1=all)
pool = FALSE, # don't pool MC runs
saveVars = lsdVars, # LSD variables to keep in dataset
addVars = newVars, # new variables to add to the LSD dataset
eval.vars = eval.vars,# function to evaluate/adjust/expand the dataset
rm.temp = TRUE, # remove temporary speedup files
rm.outl = FALSE, # remove outliers from dataset
lim.outl = 10 ) # limit non-outliers deviation (# of std. devs.)
print( dataSet$doe ) # the design of the experiment sample done in LSD
print( dataSet$valid ) # the external validation sample
print( dataSet$saVarName ) # the variable for which the response was analyzed
print( dataSet$resp ) # analysis of the response of the selected variable
# eval.run( ) EXAMPLE
# the definition of a function to evaluate a point in the DoE, associating a result
# with it (in terms of average result and dispersion/S.D.)
# the example evaluates the fat-taildness of the distribution of the selected
# variable, using the Subbotin distribution b parameter as metric (response)
library( normalp )
eval.run <- function( data, run, varIdx, conf ) {
obs <- discards <- 0
# ------ Compute Subbotin fits for each run ------
bSubbo <- rep( NA, dim( data )[ 3 ] )
for( i in 1 : dim( data )[ 3 ] ) {
x <- data[[ run, varIdx, i ]]
sf <- paramp( x )
sf$p <- estimatep( x, mu = sf$mean, p = sf$p, method = "inverse" )
if( sf$p >= 1 ) {
bSubbo[ i ] <- sf$p
obs <- obs + 1
} else {
bSubbo[ i ] <- NA
discards <- discards + 1
}
}
return( list( mean( bSubbo, na.rm = TRUE ),
var( bSubbo, na.rm = TRUE ), obs, discards ) )
}
# load data again, now using the defined evaluation function
dataSet <- read.doe.lsd( path, # data files folder
"Sim3", # data files base name (same as .lsd file)
"var2", # variable name to perform the sens. analysis
does = 2, # # of experiments (data + external validation)
iniDrop = 0, # initial time steps to drop (0=none)
nKeep = -1, # number of time steps to keep (-1=all)
saveVars = lsdVars, # LSD variables to keep in dataset
addVars = newVars, # new variables to add to the LSD dataset
eval.run = eval.run, # function to evaluate the DoE point response
rm.temp = TRUE, # remove temporary speedup files
rm.outl = FALSE, # remove outliers from dataset
lim.outl = 10 ) # limit non-outliers deviation (# of std. devs.)
print( dataSet$doe ) # the design of the experiment sample done in LSD
print( dataSet$valid ) # the external validation sample
print( dataSet$saVarName ) # the variable for which the response was analyzed
print( dataSet$resp ) # analysis of the response of the selected variable