baseline {cvms} | R Documentation |
Create baseline evaluations
Description
Create a baseline evaluation of a test set.
In modelling, a baseline is a result that
is meaningful to compare the results from our models to. For instance, in
classification, we usually want our results to be better than random guessing.
E.g. if we have three classes, we can expect an accuracy of 33.33%
, as for every
observation we have 1/3
chance of guessing the correct class. So our model should achieve
a higher accuracy than 33.33%
before it is more useful to us than guessing.
While this expected value is often fairly straightforward to find analytically, it
only represents what we can expect on average. In reality, it's possible to get far better
results than that by guessing.
baseline()
(binomial
, multinomial
)
finds the range of likely values by evaluating multiple sets
of random predictions and summarizing them with a set of useful descriptors.
If random guessing frequently obtains an accuracy of 40%
, perhaps our model
should have better performance than this, before we declare it better than guessing.
How
When `family`
is binomial
: evaluates `n`
sets of random predictions
against the dependent variable, along with a set of all 0
predictions and
a set of all 1
predictions. See also baseline_binomial()
.
When `family`
is multinomial
: creates one-vs-all (binomial)
baseline evaluations for `n`
sets of random predictions against the dependent variable,
along with sets of "all class x,y,z,..." predictions.
See also baseline_multinomial()
.
When `family`
is gaussian
: fits baseline models (y ~ 1
) on `n`
random
subsets of `train_data`
and evaluates each model on `test_data`
. Also evaluates a
model fitted on all rows in `train_data`
.
See also baseline_gaussian()
.
Wrapper functions
Consider using one of the wrappers, as they are simpler to use and understand:
baseline_gaussian()
,
baseline_multinomial()
, and
baseline_binomial()
.
Usage
baseline(
test_data,
dependent_col,
family,
train_data = NULL,
n = 100,
metrics = list(),
positive = 2,
cutoff = 0.5,
random_generator_fn = runif,
random_effects = NULL,
min_training_rows = 5,
min_training_rows_left_out = 3,
REML = FALSE,
parallel = FALSE
)
Arguments
test_data |
|
dependent_col |
Name of dependent variable in the supplied test and training sets. |
family |
Name of family. (Character) Currently supports |
train_data |
|
n |
Number of random samplings to perform. (Default is For For |
metrics |
E.g. You can enable/disable all metrics at once by including
The Also accepts the string |
positive |
Level from dependent variable to predict.
Either as character (preferable) or level index ( E.g. if we have the levels Note: For reproducibility, it's preferable to specify the name directly, as
different Used when calculating confusion matrix metrics and creating N.B. Only affects evaluation metrics, not the returned predictions. N.B. Binomial only. (Character or Integer) |
cutoff |
Threshold for predicted classes. (Numeric) N.B. Binomial only |
random_generator_fn |
Function for generating random numbers when The first argument must be the number of random numbers to generate, as no other arguments are supplied. To test the effect of using different functions,
see N.B. Multinomial only |
random_effects |
Random effects structure for the Gaussian baseline model. (Character) E.g. with N.B. Gaussian only |
min_training_rows |
Minimum number of rows in the random subsets of Gaussian only. (Integer) |
min_training_rows_left_out |
Minimum number of rows left out of the random subsets of I.e. a subset will maximally have the size:
N.B. Gaussian only. (Integer) |
REML |
Whether to use Restricted Maximum Likelihood. (Logical) N.B. Gaussian only. (Integer) |
parallel |
Whether to run the Remember to register a parallel backend first.
E.g. with |
Details
Packages used:
Models
Gaussian: stats::lm
, lme4::lmer
Results
Gaussian:
r2m : MuMIn::r.squaredGLMM
r2c : MuMIn::r.squaredGLMM
AIC : stats::AIC
AICc : MuMIn::AICc
BIC : stats::BIC
Binomial and Multinomial:
ROC and related metrics:
Binomial: pROC::roc
Multinomial: pROC::multiclass.roc
Value
list
containing:
a
tibble
with summarized results (calledsummarized_metrics
)a
tibble
with random evaluations (random_evaluations
)a
tibble
with the summarized class level results (summarized_class_level_results
) (Multinomial only)
—————————————————————-
Gaussian Results
—————————————————————-
The Summarized Results tibble
contains:
Average RMSE
, MAE
, NRMSE(IQR)
,
RRSE
, RAE
, RMSLE
.
See the additional metrics (disabled by default) at ?gaussian_metrics
.
The Measure column indicates the statistical descriptor used on the evaluations.
The row where Measure == All_rows
is the evaluation when the baseline model
is trained on all rows in `train_data`
.
The Training Rows column contains the aggregated number of rows used from `train_data`
,
when fitting the baseline models.
....................................................................
The Random Evaluations tibble
contains:
The non-aggregated metrics.
A nested tibble
with the predictions and targets.
A nested tibble
with the coefficients of the baseline models.
Number of training rows used when fitting the baseline model on the training set.
A nested Process information object with information about the evaluation.
Name of dependent variable.
Name of fixed effect (bias term only).
Random effects structure (if specified).
—————————————————————-
Binomial Results
—————————————————————-
Based on the generated test set predictions,
a confusion matrix and ROC
curve are used to get the following:
ROC
:
AUC
, Lower CI
, and Upper CI
Note, that the ROC
curve is only computed when AUC
is enabled.
Confusion Matrix
:
Balanced Accuracy
,
Accuracy
,
F1
,
Sensitivity
,
Specificity
,
Positive Predictive Value
,
Negative Predictive Value
,
Kappa
,
Detection Rate
,
Detection Prevalence
,
Prevalence
, and
MCC
(Matthews correlation coefficient).
....................................................................
The Summarized Results tibble
contains:
The Measure column indicates the statistical descriptor used on the evaluations.
The row where Measure == All_0
is the evaluation when all predictions are 0
.
The row where Measure == All_1
is the evaluation when all predictions are 1
.
The aggregated metrics.
....................................................................
The Random Evaluations tibble
contains:
The non-aggregated metrics.
A nested tibble
with the predictions and targets.
A list
of ROC curve objects (if computed).
A nested tibble
with the confusion matrix.
The Pos_
columns tells you whether a row is a
True Positive (TP
), True Negative (TN
), False Positive (FP
),
or False Negative (FN
), depending on which level is the "positive" class.
I.e. the level you wish to predict.
A nested Process information object with information about the evaluation.
Name of dependent variable.
—————————————————————-
Multinomial Results
—————————————————————-
Based on the generated test set predictions,
one-vs-all (binomial) evaluations are performed and aggregated
to get the same metrics as in the binomial
results
(excluding MCC
, AUC
, Lower CI
and Upper CI
),
with the addition of Overall Accuracy and multiclass
MCC in the summarized results.
It is possible to enable multiclass AUC as well, which has been
disabled by default as it is slow to calculate when there's a large set of classes.
Since we use macro-averaging, Balanced Accuracy
is the macro-averaged
metric, not the macro sensitivity as sometimes used.
Note: we also refer to the one-vs-all evaluations as the class level results.
....................................................................
The Summarized Results tibble
contains:
Summary of the random evaluations.
How: First, the one-vs-all binomial evaluations are aggregated by repetition,
then, these aggregations are summarized. Besides the
metrics from the binomial evaluations (see Binomial Results above), it
also includes Overall Accuracy
and multiclass MCC
.
The Measure column indicates the statistical descriptor used on the evaluations.
The Mean, Median, SD, IQR, Max, Min,
NAs, and INFs measures describe the Random Evaluations tibble
,
while the CL_Max, CL_Min, CL_NAs, and
CL_INFs describe the Class Level results.
The rows where Measure == All_<<class name>>
are the evaluations when all
the observations are predicted to be in that class.
....................................................................
The Summarized Class Level Results tibble
contains:
The (nested) summarized results for each class, with the same metrics and descriptors as
the Summarized Results tibble
. Use tidyr::unnest
on the tibble
to inspect the results.
How: The one-vs-all evaluations are summarized by class.
The rows where Measure == All_0
are the evaluations when none of the observations
are predicted to be in that class, while the rows where Measure == All_1
are the
evaluations when all of the observations are predicted to be in that class.
....................................................................
The Random Evaluations tibble
contains:
The repetition results with the same metrics as the Summarized Results tibble
.
How: The one-vs-all evaluations are aggregated by repetition.
If a metric contains one or more NAs
in the one-vs-all evaluations, it
will lead to an NA
result for that repetition.
Also includes:
A nested tibble
with the one-vs-all binomial evaluations (Class Level Results),
including nested Confusion Matrices and the
Support column, which is a count of how many observations from the
class is in the test set.
A nested tibble
with the predictions and targets.
A list
of ROC curve objects.
A nested tibble
with the multiclass confusion matrix.
A nested Process information object with information about the evaluation.
Name of dependent variable.
Author(s)
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
See Also
Other baseline functions:
baseline_binomial()
,
baseline_gaussian()
,
baseline_multinomial()
Examples
# Attach packages
library(cvms)
library(groupdata2) # partition()
library(dplyr) # %>% arrange()
library(tibble)
# Data is part of cvms
data <- participant.scores
# Set seed for reproducibility
set.seed(1)
# Partition data
partitions <- partition(data, p = 0.7, list_out = TRUE)
train_set <- partitions[[1]]
test_set <- partitions[[2]]
# Create baseline evaluations
# Note: usually n=100 is a good setting
# Gaussian
baseline(
test_data = test_set, train_data = train_set,
dependent_col = "score", random_effects = "(1|session)",
n = 2, family = "gaussian"
)
# Binomial
baseline(
test_data = test_set, dependent_col = "diagnosis",
n = 2, family = "binomial"
)
# Multinomial
# Create some data with multiple classes
multiclass_data <- tibble(
"target" = rep(paste0("class_", 1:5), each = 10)
) %>%
dplyr::sample_n(35)
baseline(
test_data = multiclass_data,
dependent_col = "target",
n = 4, family = "multinomial"
)
# Parallelize evaluations
# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)
# Binomial
baseline(
test_data = test_set, dependent_col = "diagnosis",
n = 4, family = "binomial"
#, parallel = TRUE # Uncomment
)
# Gaussian
baseline(
test_data = test_set, train_data = train_set,
dependent_col = "score", random_effects = "(1|session)",
n = 4, family = "gaussian"
#, parallel = TRUE # Uncomment
)
# Multinomial
(mb <- baseline(
test_data = multiclass_data,
dependent_col = "target",
n = 6, family = "multinomial"
#, parallel = TRUE # Uncomment
))
# Inspect the summarized class level results
# for class_2
mb$summarized_class_level_results %>%
dplyr::filter(Class == "class_2") %>%
tidyr::unnest(Results)
# Multinomial with custom random generator function
# that creates very "certain" predictions
# (once softmax is applied)
rcertain <- function(n) {
(runif(n, min = 1, max = 100)^1.4) / 100
}
baseline(
test_data = multiclass_data,
dependent_col = "target",
n = 6, family = "multinomial",
random_generator_fn = rcertain
#, parallel = TRUE # Uncomment
)