| familiarDataElement-class {familiar} | R Documentation |
Data container for evaluation data.
Description
Most attributes of the familiarData object are objects of the familiarDataElement class. This (super-)class is used to allow for standardised aggregation and processing of evaluation data.
Slots
dataEvaluation data, typically a data.table or list.
identifiersIdentifiers of the data, e.g. the generating model name, learner, etc.
detail_levelSets the level at which results are computed and aggregated.
-
ensemble: Results are computed at the ensemble level, i.e. over all models in the ensemble. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the model performance of the ensemble model for each bootstrap. -
hybrid(default): Results are computed at the level of models in an ensemble. This means that, for example, bias-corrected estimates of model performance are directly computed using the models in the ensemble. If there are at least 20 trained models in the ensemble, performance is computed for each model, in contrast toensemblewhere performance is computed for the ensemble of models. If there are less than 20 trained models in the ensemble, bootstraps are created so that at least 20 point estimates can be made. -
model: Results are computed at the model level. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the performance of the model for each bootstrap.
Note that each level of detail has a different interpretation for bootstrap confidence intervals. For
ensembleandmodelthese are the confidence intervals for the ensemble and an individual model, respectively. That is, the confidence interval describes the range where an estimate produced by a respective ensemble or model trained on a repeat of the experiment may be found with the probability of the confidence level. Forhybrid, it represents the range where any single model trained on a repeat of the experiment may be found with the probability of the confidence level. By definition, confidence intervals obtained usinghybridare at least as wide as those forensemble.hybridoffers the correct interpretation if the goal of the analysis is to assess the result of a single, unspecified, model.Some child classes do not use this parameter.
-
estimation_typeSets the type of estimation that should be possible. This has the following options:
-
point: Point estimates. -
bias_correctionorbc: Bias-corrected estimates. A bias-corrected estimate is computed from (at least) 20 point estimates, andfamiliarmay bootstrap the data to create them. -
bootstrap_confidence_intervalorbci(default): Bias-corrected estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The number of point estimates required depends on theconfidence_levelparameter, andfamiliarmay bootstrap the data to create them.
Some child classes do not use this parameter.
-
confidence_level(optional) Numeric value for the level at which confidence intervals are determined. In the case bootstraps are used to determine the confidence intervals bootstrap estimation,
familiaruses the rule of thumbn = 20 / ci.levelto determine the number of required bootstraps.bootstrap_ci_methodMethod used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
-
percentile(default): Confidence intervals obtained using the percentile method. -
bc: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
-
value_columnIdentifies column(s) in the
dataattribute presenting values.grouping_columnIdentifies column(s) in the
dataattribute presenting identifier columns for grouping during aggregation. Familiar will automatically assign items from theidentifiersattribute to the data and this attribute when combining multiple familiarDataElements of the same (child) class.is_aggregatedDefines whether the object was aggregated.
References
Efron, B. & Hastie, T. Computer Age Statistical Inference. (Cambridge University Press, 2016).