sample_experiment_data {auctestr} | R Documentation |
Performance of several predictive models over three different datasets, using multiple cutoff points for time within each dataset.
Description
A dataset containing the performance of several predictive models over three different datasets, where models are built using data from multiple time points (where time 1 has less data than time 2, but each subsequent time point T also uses data from all prior time points up to that time t<= T.) This represents the typical output of a machine learning experiment where several models are being considered across multiple datasets, often with different variants of each model type being considered (i.e., different hyperparameter settings of each model).
Usage
sample_experiment_data
Format
A data frame with 180 rows and 10 variables:
- auc
Area Under the Receiver Operating Characteristic Curve, or AUC, for this model configuration.
- precision
Precision for this model configuration.
- accuracy
Accuracy for this model configuration.
- n
Number of observations in this dataset.
- n_n
Number of negative observations (i.e., outcome == 0) in this dataset (required for standard error estimation of AUC statistic).
- n_p
Number of positive observations (i.e., outcome == 1) in this dataset (required for standard error estimation of AUC statistic).
- dataset
indicator for different datasets.
- time
indicator for different time points used to build each dataset; these represent dependent observations of model performance.
- model_id
Indicator for the statistical algorithm used (this could be 'Logistic Regression', 'SVM', etc.).
- model_variant
Indicator for different variants of each model which are not equivalent and should be used individually (model should not be averaged over these, and instead should be held fixed when comparing to other model). Example of this could be various hyperparameter settings for a given model (i.e., cost for an SVM).