familiarDataElement-class {familiar} | R Documentation |
Data container for evaluation data.
Description
Most attributes of the familiarData object are objects of the familiarDataElement class. This (super-)class is used to allow for standardised aggregation and processing of evaluation data.
Slots
data
Evaluation data, typically a data.table or list.
identifiers
Identifiers of the data, e.g. the generating model name, learner, etc.
detail_level
Sets the level at which results are computed and aggregated.
-
ensemble
: Results are computed at the ensemble level, i.e. over all models in the ensemble. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the model performance of the ensemble model for each bootstrap. -
hybrid
(default): Results are computed at the level of models in an ensemble. This means that, for example, bias-corrected estimates of model performance are directly computed using the models in the ensemble. If there are at least 20 trained models in the ensemble, performance is computed for each model, in contrast toensemble
where performance is computed for the ensemble of models. If there are less than 20 trained models in the ensemble, bootstraps are created so that at least 20 point estimates can be made. -
model
: Results are computed at the model level. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the performance of the model for each bootstrap.
Note that each level of detail has a different interpretation for bootstrap confidence intervals. For
ensemble
andmodel
these are the confidence intervals for the ensemble and an individual model, respectively. That is, the confidence interval describes the range where an estimate produced by a respective ensemble or model trained on a repeat of the experiment may be found with the probability of the confidence level. Forhybrid
, it represents the range where any single model trained on a repeat of the experiment may be found with the probability of the confidence level. By definition, confidence intervals obtained usinghybrid
are at least as wide as those forensemble
.hybrid
offers the correct interpretation if the goal of the analysis is to assess the result of a single, unspecified, model.Some child classes do not use this parameter.
-
estimation_type
Sets the type of estimation that should be possible. This has the following options:
-
point
: Point estimates. -
bias_correction
orbc
: Bias-corrected estimates. A bias-corrected estimate is computed from (at least) 20 point estimates, andfamiliar
may bootstrap the data to create them. -
bootstrap_confidence_interval
orbci
(default): Bias-corrected estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The number of point estimates required depends on theconfidence_level
parameter, andfamiliar
may bootstrap the data to create them.
Some child classes do not use this parameter.
-
confidence_level
(optional) Numeric value for the level at which confidence intervals are determined. In the case bootstraps are used to determine the confidence intervals bootstrap estimation,
familiar
uses the rule of thumbn = 20 / ci.level
to determine the number of required bootstraps.bootstrap_ci_method
Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
-
percentile
(default): Confidence intervals obtained using the percentile method. -
bc
: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
-
value_column
Identifies column(s) in the
data
attribute presenting values.grouping_column
Identifies column(s) in the
data
attribute presenting identifier columns for grouping during aggregation. Familiar will automatically assign items from theidentifiers
attribute to the data and this attribute when combining multiple familiarDataElements of the same (child) class.is_aggregated
Defines whether the object was aggregated.
References
Efron, B. & Hastie, T. Computer Age Statistical Inference. (Cambridge University Press, 2016).