stab_control {stablelearner} | R Documentation |
Control for Supervised Stability Assessments
Description
Various parameters that control aspects of the stability assessment performed
via stability
.
Usage
stab_control(B = 500, measure = list(tvdist, ccc), sampler = "bootstrap",
evaluate = "OOB", holdout = 0.25, seed = NULL, na.action = na.exclude,
savepred = TRUE, silent = TRUE, ...)
Arguments
B |
an integer value specifying the number of repetitions. The default
is |
measure |
a list of similarity measure (generating) functions. Those
should either be functions of |
sampler |
a resampling (generating) function. Either this should be a
function of |
evaluate |
a character specifying the evaluation strategy to be applied
(see Details below). The default is |
holdout |
a numeric value between zero and one that specifies the
proportion of observations hold out for evaluation over all repetitions,
only if |
seed |
a single value, interpreted as an integer, see
|
na.action |
a function which indicates what should happen when the
predictions of the results contain |
savepred |
logical. Should the predictions from each iteration be
saved? If |
silent |
logical. If |
... |
arguments passed to |
Details
With the argument measure
one or more measures can be defined that are
used to assess the stability of a result from supervised statistical learning
by stability
. Predefined similarity measures for the regression
and the classification case are listed in similarity_measures_classification
and similarity_measures_regression
.
Users can define their own similarity functions f(p1, p2)
that must
return a single numeric value for the similarity between two results trained on
resampled data sets. Such a function must take the arguments p1
and p2
.
In the classification case, p1
and p2
are probability matrices of
size m * K, where m
is the number of predicted observations (size
of the evaluation sample) and K is the number of classes. In the
regression case, p1
and p2
are numeric vectors of length
m.
A different way to implement new similarity functions for the current R
session is to define a similarity measure generator function, which is a
function without arguments that generates a list of five elements including the
name of the similarity measure, the function to compute the similarity
between the predictions as described above, a vector of character values
specifying the response types for which the similarity measure can be used,
a list containing two numeric elements lower
and upper
that
specify the range of values of the similarity measure and the function to
invert (or reverse) the similarity values such that higher values indicate
higher stability. The latter can be set to NULL
, if higher similarity
values already indicate higher stability. Those elements should be named
name
, measure
, classes
, range
and reverse
.
The argument evaluate
can be used to specify the evaluation strategy.
If set to "ALL"
, all observations in the original data set are used for
evaluation. If set to "OOB"
, only the pairwise out-of-bag observations
are used for evaluation within each repetition. If set to "OOS"
, a
fraction (defined by holdout
) of the observations in the original data
set are randomly sampled and used for evaluation, but not for training, over all
repetitions.
The argument seed
can be used to make similarity assessments comparable
when comparing the stability of different results that were trained on the same
data set. By default, seed
is set to NULL
and the learning samples
are sampled independently for each fitted model object passed to
stability
. If seed
is set to a specific number, the seed
will be set for each fitted model object before the learning samples are
generated using "L'Ecuyer-CMRG"
(see set.seed
) which
guarantees identical learning samples for each stability assessment and, thus,
comparability of the stability assessments between the results.
See Also
Examples
library("partykit")
res <- ctree(Species ~ ., data = iris)
## less repetitions
stability(res, control = stab_control(B = 100))
## Not run:
## change similarity measure
stability(res, control = stab_control(measure = list(bdist)))
## change evaluation strategy
stability(res, control = stab_control(evaluate = "ALL"))
stability(res, control = stab_control(evaluate = "OOS"))
## change resampling strategy to subsampling
stability(res, control = stab_control(sampler = subsampling))
stability(res, control = stab_control(sampler = subsampling, evaluate = "ALL"))
stability(res, control = stab_control(sampler = subsampling, evaluate = "OOS"))
## change resampling strategy to splithalf
stability(res, control = stab_control(sampler = splithalf, evaluate = "ALL"))
stability(res, control = stab_control(sampler = splithalf, evaluate = "OOS"))
## End(Not run)