baseline_multinomial {cvms}  R Documentation 
Create a baseline evaluation of a test set.
In modelling, a baseline is a result that
is meaningful to compare the results from our models to. For instance, in
classification, we usually want our results to be better than random guessing.
E.g. if we have three classes, we can expect an accuracy of 33.33%
, as for every
observation we have 1/3
chance of guessing the correct class. So our model should achieve
a higher accuracy than 33.33%
before it is more useful to us than guessing.
While this expected value is often fairly straightforward to find analytically, it
only represents what we can expect on average. In reality, it's possible to get far better
results than that by guessing.
baseline_multinomial()
finds the range of likely values by evaluating multiple sets
of random predictions and summarizing them with a set of useful descriptors.
Technically, it creates onevsall (binomial) baseline evaluations
for the `n`
sets of random predictions and summarizes them. Additionally,
sets of "all class x,y,z,..." predictions are evaluated.
baseline_multinomial(
test_data,
dependent_col,
n = 100,
metrics = list(),
random_generator_fn = runif,
parallel = FALSE
)
test_data 

dependent_col 
Name of dependent variable in the supplied test and training sets. 
n 
The number of sets of random predictions to evaluate. (Default is 
metrics 
E.g. You can enable/disable all metrics at once by including
The Also accepts the string 
random_generator_fn 
Function for generating random numbers.
The The first argument must be the number of random numbers to generate, as no other arguments are supplied. To test the effect of using different functions,
see 
parallel 
Whether to run the Remember to register a parallel backend first.
E.g. with 
Packages used:
Multiclass ROC
curve and AUC
:
pROC::multiclass.roc
list
containing:
a tibble
with summarized results (called summarized_metrics
)
a tibble
with random evaluations (random_evaluations
)
a tibble
with the summarized class level results
(summarized_class_level_results
)
....................................................................
Based on the generated predictions, onevsall (binomial) evaluations are performed and aggregated to get the following macro metrics:
Balanced Accuracy
,
F1
,
Sensitivity
,
Specificity
,
Positive Predictive Value
,
Negative Predictive Value
,
Kappa
,
Detection Rate
,
Detection Prevalence
, and
Prevalence
.
In general, the metrics mentioned in
binomial_metrics()
can be enabled as macro metrics
(excluding MCC
, AUC
, Lower CI
,
Upper CI
, and the AIC/AICc/BIC
metrics).
These metrics also has a weighted average
version.
N.B. we also refer to the onevsall evaluations as the class level results.
In addition, the Overall Accuracy
and multiclass
MCC
metrics are computed. Multiclass AUC
can be enabled but
is slow to calculate with many classes.
....................................................................
The Summarized Results tibble
contains:
Summary of the random evaluations.
How: The onevsall binomial evaluations are aggregated by repetition and summarized. Besides the
metrics from the binomial evaluations, it
also includes Overall Accuracy
and multiclass MCC
.
The Measure column indicates the statistical descriptor used on the evaluations.
The Mean, Median, SD, IQR, Max, Min,
NAs, and INFs measures describe the Random Evaluations tibble
,
while the CL_Max, CL_Min, CL_NAs, and
CL_INFs describe the Class Level results.
The rows where Measure == All_<<class name>>
are the evaluations when all
the observations are predicted to be in that class.
....................................................................
The Summarized Class Level Results tibble
contains:
The (nested) summarized results for each class, with the same metrics and descriptors as
the Summarized Results tibble
. Use tidyr::unnest
on the tibble
to inspect the results.
How: The onevsall evaluations are summarized by class.
The rows where Measure == All_0
are the evaluations when none of the observations
are predicted to be in that class, while the rows where Measure == All_1
are the
evaluations when all of the observations are predicted to be in that class.
....................................................................
The Random Evaluations tibble
contains:
The repetition results with the same metrics as the Summarized Results tibble
.
How: The onevsall evaluations are aggregated by repetition.
If a metric contains one or more NAs
in the onevsall evaluations, it
will lead to an NA
result for that repetition.
Also includes:
A nested tibble
with the onevsall binomial evaluations (Class Level Results),
including nested Confusion Matrices and the
Support column, which is a count of how many observations from the
class is in the test set.
A nested tibble
with the predictions and targets.
A list
of ROC curve objects.
A nested tibble
with the multiclass confusion matrix.
A nested Process information object with information about the evaluation.
Name of dependent variable.
Ludvig Renbo Olsen, rpkgs@ludvigolsen.dk
Other baseline functions:
baseline_binomial()
,
baseline_gaussian()
,
baseline()
# Attach packages
library(cvms)
library(groupdata2) # partition()
library(dplyr) # %>% arrange()
library(tibble)
# Data is part of cvms
data < participant.scores
# Set seed for reproducibility
set.seed(1)
# Partition data
partitions < partition(data, p = 0.7, list_out = TRUE)
train_set < partitions[[1]]
test_set < partitions[[2]]
# Create baseline evaluations
# Note: usually n=100 is a good setting
# Create some data with multiple classes
multiclass_data < tibble(
"target" = rep(paste0("class_", 1:5), each = 10)
) %>%
dplyr::sample_n(35)
baseline_multinomial(
test_data = multiclass_data,
dependent_col = "target",
n = 4
)
# Parallelize evaluations
# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)
# Make sure to uncomment the parallel argument
(mb < baseline_multinomial(
test_data = multiclass_data,
dependent_col = "target",
n = 6
#, parallel = TRUE # Uncomment
))
# Inspect the summarized class level results
# for class_2
mb$summarized_class_level_results %>%
dplyr::filter(Class == "class_2") %>%
tidyr::unnest(Results)
# Multinomial with custom random generator function
# that creates very "certain" predictions
# (once softmax is applied)
rcertain < function(n) {
(runif(n, min = 1, max = 100)^1.4) / 100
}
# Make sure to uncomment the parallel argument
baseline_multinomial(
test_data = multiclass_data,
dependent_col = "target",
n = 6,
random_generator_fn = rcertain
#, parallel = TRUE # Uncomment
)