stability {stablelearner}R Documentation

Stability Assessment for Results from Supervised Statistical Learning

Description

Stability assessment of results from supervised statistical learning (i.e., recursive partitioning, support vector machines, neural networks, etc.). The procedure involves the pairwise comparison of results generated from learning samples randomly drawn from the original data set or directly from the data-generating process (if available).

Usage

  stability(x, ..., data = NULL, control = stab_control(), weights = NULL, 
    applyfun = NULL, cores = NULL, names = NULL)

Arguments

x

fitted model object. Any model object can be used whose class is registered in LearnerList. Users can add classes for the current R session to LearnerList, see addLearner.

...

additional fitted model objects.

data

an optional data.frame or a data-generating function. By default the learning data from x is used (if this can be inferred from the getCall of x.

control

a list with control parameters, see stab_control.

weights

an optional matrix of dimension n * B that can be used to weight the observations from the original learning data when the models are refitted. If weights = true, the weights are computed internally according to the sampler defined in control. If weight = NULL (default), no case-weights are used and the sampler defined in control will be applied to the original data set.

applyfun

a lapply-like function. The default is to use lapply unless cores is specified in which case mclapply is used (for multicore computations on platforms that support these).

cores

integer. The number of cores to use in multicore computations using mclapply (see above).

names

a vector of characters to specify a name for each fitted model object. By default, the objects are named by their class.

Details

Assesses the (overall) stability of a result from supervised statistical learning by quantifying the similarity of realizations from the distribution of possible results (given the algorithm, the formulated model, the data-generating process, the sample size, etc.). The stability distribution is estimated by repeatedly assessing the similarity between the results generated by training the algorithm on two different learning samples, by means of a similarity metric. The learning samples are generated by sampling from the learning data or the data-generating process in case of a simulation study. For more details, see Philipp et al. (2018).

Value

For a single fitted model object, stability returns an object of class "stablelearner" with the following components:

call

the call from the model object x,

learner

the information about the learner retrieved from LearnerList,

B

the number of repetitions,

sval

a matrix containing the estimated similarity values for each similarity measure specified in control,

sampstat

a list containing information on the size of the learning samples (ls), the size of the overlap between the learning samples (lo), the size of the evaluation sample (es) and the size of the overlap between the evaluation and the learning samples (eo) in each repetition.

data

a language object referring to the data.frame or the data-generating function used for assessing the stability,

control

a list with control parameters used for assessing the stability,

For several fitted model objects, stability returns an object of class "stablelearnerList" which is a list of objects of class "stablelearner".

References

Philipp M, Rusch T, Hornik K, Strobl C (2018). “Measuring the Stability of Results from Supervised Statistical Learning”. Journal of Computational and Graphical Statistics, 27(4), 685–700. doi:10.1080/10618600.2018.1473779

See Also

boxplot.stablelearnerList, summary.stablelearner

Examples



## assessing the stability of a single result
library("partykit")
r1 <- ctree(Species ~ ., data = iris)
stab <- stability(r1)
summary(stab)

## assessing the stability of several results
library("rpart")
r2 <- rpart(Species ~ ., data = iris)
stab <- stability(r1, r2, control = stab_control(seed = 0))
summary(stab, names = c("ctree", "rpart"))

## using case-weights instead of resampling
stability(r1, weights = TRUE)

## using self-defined case-weights
n <- nrow(iris)
B <- 500
w <- array(sample(c(0, 1), size = n * B * 3, replace = TRUE), dim = c(n, B, 3))
stability(r1, weights = w)

## assessing stability for a given data-generating process
my_dgp <- function() dgp_twoclass(n = 100, p = 2, noise = 4, rho = 0.2)
res <- ctree(class ~ ., data = my_dgp())
stability(res, data = my_dgp)



[Package stablelearner version 0.1-5 Index]