baseline_gaussian {cvms}  R Documentation 
Create a baseline evaluation of a test set.
In modelling, a baseline is a result that is meaningful to compare the results from our models to. In regression, we want our model to be better than a model without any predictors. If our model does not perform better than such a simple model, it's unlikely to be useful.
baseline_gaussian()
fits the interceptonly model (y ~ 1
) on `n`
random
subsets of `train_data`
and evaluates each model on `test_data`
. Additionally, it evaluates a
model fitted on all rows in `train_data`
.
baseline_gaussian(
test_data,
train_data,
dependent_col,
n = 100,
metrics = list(),
random_effects = NULL,
min_training_rows = 5,
min_training_rows_left_out = 3,
REML = FALSE,
parallel = FALSE
)
test_data 

train_data 

dependent_col 
Name of dependent variable in the supplied test and training sets. 
n 
The number of random samplings of 
metrics 
E.g. You can enable/disable all metrics at once by including
The Also accepts the string 
random_effects 
Random effects structure for the baseline model. (Character) E.g. with 
min_training_rows 
Minimum number of rows in the random subsets of 
min_training_rows_left_out 
Minimum number of rows left out of the random subsets of I.e. a subset will maximally have the size:

REML 
Whether to use Restricted Maximum Likelihood. (Logical) 
parallel 
Whether to run the Remember to register a parallel backend first.
E.g. with 
Packages used:
r2m : MuMIn::r.squaredGLMM
r2c : MuMIn::r.squaredGLMM
AIC : stats::AIC
AICc : MuMIn::AICc
BIC : stats::BIC
list
containing:
a tibble
with summarized results (called summarized_metrics
)
a tibble
with random evaluations (random_evaluations
)
....................................................................
The Summarized Results tibble
contains:
Average RMSE
, MAE
, NRMSE(IQR)
,
RRSE
, RAE
, RMSLE
.
See the additional metrics (disabled by default) at ?gaussian_metrics
.
The Measure column indicates the statistical descriptor used on the evaluations.
The row where Measure == All_rows
is the evaluation when the baseline model
is trained on all rows in `train_data`
.
The Training Rows column contains the aggregated number of rows used from `train_data`
,
when fitting the baseline models.
....................................................................
The Random Evaluations tibble
contains:
The nonaggregated metrics.
A nested tibble
with the predictions and targets.
A nested tibble
with the coefficients of the baseline models.
Number of training rows used when fitting the baseline model on the training set.
A nested Process information object with information about the evaluation.
Name of dependent variable.
Name of fixed effect (bias term only).
Random effects structure (if specified).
Ludvig Renbo Olsen, rpkgs@ludvigolsen.dk
Other baseline functions:
baseline_binomial()
,
baseline_multinomial()
,
baseline()
# Attach packages
library(cvms)
library(groupdata2) # partition()
library(dplyr) # %>% arrange()
# Data is part of cvms
data < participant.scores
# Set seed for reproducibility
set.seed(1)
# Partition data
partitions < partition(data, p = 0.7, list_out = TRUE)
train_set < partitions[[1]]
test_set < partitions[[2]]
# Create baseline evaluations
# Note: usually n=100 is a good setting
baseline_gaussian(
test_data = test_set,
train_data = train_set,
dependent_col = "score",
random_effects = "(1session)",
n = 2
)
# Parallelize evaluations
# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)
# Make sure to uncomment the parallel argument
baseline_gaussian(
test_data = test_set,
train_data = train_set,
dependent_col = "score",
random_effects = "(1session)",
n = 4
#, parallel = TRUE # Uncomment
)