export_risk_stratification_data {familiar} | R Documentation |
Extract and export sample risk group stratification and associated
tests.
Description
Extract and export sample risk group stratification and
associated tests for data in a familiarCollection.
Usage
export_risk_stratification_data(
object,
dir_path = NULL,
export_strata = TRUE,
time_range = NULL,
export_collection = FALSE,
...
)
## S4 method for signature 'familiarCollection'
export_risk_stratification_data(
object,
dir_path = NULL,
export_strata = TRUE,
time_range = NULL,
export_collection = FALSE,
...
)
## S4 method for signature 'ANY'
export_risk_stratification_data(
object,
dir_path = NULL,
export_strata = TRUE,
time_range = NULL,
export_collection = FALSE,
...
)
Arguments
object |
A familiarCollection object, or other other objects from which
a familiarCollection can be extracted. See details for more information.
|
dir_path |
Path to folder where extracted data should be saved. NULL
will allow export as a structured list of data.tables.
|
export_strata |
Flag that determines whether the raw data or strata are
exported.
|
time_range |
Time range for which strata should be created. If NULL ,
the full time range is used.
|
export_collection |
(optional) Exports the collection if TRUE.
|
... |
Arguments passed on to extract_risk_stratification_data , as_familiar_collection
data A dataObject object, data.table or data.frame that
constitutes the data that are assessed.
is_pre_processed Flag that indicates whether the data was already
pre-processed externally, e.g. normalised and clustered. Only used if the
data argument is a data.table or data.frame .
cl Cluster created using the parallel package. This cluster is then
used to speed up computation through parallellisation.
ensemble_method Method for ensembling predictions from models for the
same sample. Available methods are:
verbose Flag to indicate whether feedback should be provided on the
computation and extraction of various data elements.
message_indent Number of indentation steps for messages shown during
computation and extraction of various data elements.
detail_level (optional) Sets the level at which results are computed
and aggregated.
-
ensemble : Results are computed at the ensemble level, i.e. over all
models in the ensemble. This means that, for example, bias-corrected
estimates of model performance are assessed by creating (at least) 20
bootstraps and computing the model performance of the ensemble model for
each bootstrap.
-
hybrid (default): Results are computed at the level of models in an
ensemble. This means that, for example, bias-corrected estimates of model
performance are directly computed using the models in the ensemble. If there
are at least 20 trained models in the ensemble, performance is computed for
each model, in contrast to ensemble where performance is computed for the
ensemble of models. If there are less than 20 trained models in the
ensemble, bootstraps are created so that at least 20 point estimates can be
made.
-
model : Results are computed at the model level. This means that, for
example, bias-corrected estimates of model performance are assessed by
creating (at least) 20 bootstraps and computing the performance of the model
for each bootstrap.
Note that each level of detail has a different interpretation for bootstrap
confidence intervals. For ensemble and model these are the confidence
intervals for the ensemble and an individual model, respectively. That is,
the confidence interval describes the range where an estimate produced by a
respective ensemble or model trained on a repeat of the experiment may be
found with the probability of the confidence level. For hybrid , it
represents the range where any single model trained on a repeat of the
experiment may be found with the probability of the confidence level. By
definition, confidence intervals obtained using hybrid are at least as
wide as those for ensemble . hybrid offers the correct interpretation if
the goal of the analysis is to assess the result of a single, unspecified,
model.
hybrid is generally computationally less expensive then ensemble , which
in turn is somewhat less expensive than model .
A non-default detail_level parameter can be specified for separate
evaluation steps by providing a parameter value in a named list with data
elements, e.g. list("auc_data"="ensemble", "model_performance"="hybrid") .
This parameter can be set for the following data elements: auc_data ,
decision_curve_analyis , model_performance , permutation_vimp ,
ice_data , prediction_data and confusion_matrix .
confidence_level (optional) Numeric value for the level at which
confidence intervals are determined. In the case bootstraps are used to
determine the confidence intervals bootstrap estimation, familiar uses the
rule of thumb n = 20 / ci.level to determine the number of required
bootstraps.
The default value is 0.95 .
familiar_data_names Names of the dataset(s). Only used if the object parameter
is one or more familiarData objects.
collection_name Name of the collection.
|
Details
Data is usually collected from a familiarCollection
object.
However, you can also provide one or more familiarData
objects, that will
be internally converted to a familiarCollection
object. It is also
possible to provide a familiarEnsemble
or one or more familiarModel
objects together with the data from which data is computed prior to export.
Paths to the previous files can also be provided.
All parameters aside from object
and dir_path
are only used if object
is not a familiarCollection
object, or a path to one.
Three tables are exported in a list:
-
data
: Contains the assigned risk group for a given sample, along with
its reported survival time and censoring status.
-
hr_ratio
: Contains the hazard ratio between different risk groups.
-
logrank
: Contains the results from the logrank test between different
risk groups.
Value
A list of data.tables (if dir_path
is not provided), or nothing, as
all data is exported to csv
files.
[Package
familiar version 1.4.8
Index]