R: Leave-one-out cross validation of model selection

sim_modelSel {poolABC}

R Documentation

Leave-one-out cross validation of model selection

Description

This function performs a simulation study to assess the quality of model selection with ABC. This is done by performing a leave-one-out cross validation via subsequent calls to the function modelSelect().

Usage

sim_modelSel(index, sumstats, nval, tol, warning = FALSE)

Arguments

`index`	is a vector of model indices. This can be a a character vector of model names, repeated as many times as there are simulations for each model. This vector will be coerced to factor and it must have the same length as `nrow(sumstats)` to indicate which row of the `sumstats` matrix belongs to which model.
`sumstats`	is a vector or matrix containing the simulated summary statistics for all the models. Each row or vector entry should be a different simulation and each column of a matrix should be a different statistic. The order must be the same as the order of the models in the `index` vector.
`nval`	a numerical value defining the the size of the cross-validation sample for each model.
`tol`	is a numerical value, indicating the required proportion of points nearest the target values (tolerance).
`warning`	logical, if FALSE (default) warnings produced while running this function, mainly related with accepting simulations for just one of the models, will not be displayed.

Details

One simulation is randomly selected from each model to be a validation simulation, while all the other simulations are used as training simulations. This random simulation is used as the target of the modelSelect() function and posterior model probabilities are estimated.

Please note that the actual size of the cross-validation sample is nval*the number of models. This is because nval cross-validation estimation steps are performed for each model.

Value

a list with the following elements:

`cvsamples`	is a vector of length `nval*the number of models` indicating which rows of the `sumstat` input were used as validation values for each model.
`true`	a character vector of the true models.
`estimated`	a character vector of the estimated models.
`model.probs`	a matrix with the estimated model probabilities. Each row of the matrix represents a different cross-validation trial.
`models`	a character vector with the designation of the models.

Examples

# load the matrix with simulated parameter values
data(sumstats)

# select a random simulation to act as target just to test the function
target <- sumstats[10 ,]

# create a "fake" vector of model indices
# this assumes that half the simulations were from one model and the other half from other model
# this is not true but serves as an example of how to use this function
index <- c(rep("model1", nrow(sumstats)/2), rep("model2", nrow(sumstats)/2))

# perform a leave-one-out cross validation of model selection
sim_modelSel(index = index, sumstats = sumstats, nval = 10, tol = 0.1)

[Package poolABC version 1.0.0 Index]