Evaluate your model's predictions on a set of evaluation metrics.

Create ID-aggregated evaluations by multiple methods.

Currently supports regression and classification
(binary and multiclass). See ``type``

.

```
evaluate(
data,
target_col,
prediction_cols,
type,
id_col = NULL,
id_method = "mean",
apply_softmax = FALSE,
cutoff = 0.5,
positive = 2,
metrics = list(),
include_predictions = TRUE,
parallel = FALSE,
models = deprecated()
)
```

`data` |
## MultinomialWhen ## Probabilities (Preferable)One column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:
## ClassesA single column of type
## BinomialWhen ## Probabilities (Preferable)One column with the
Note: At the alphabetical ordering of the class labels, they are of type ## ClassesA single column of type
Note: The prediction column will be converted to the probability ## GaussianWhen
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`target_col` |
Name of the column with the true classes/values in When | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`prediction_cols` |
Name(s) of column(s) with the predictions. Columns can be either numeric or character depending on which format is chosen.
See | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`type` |
Type of evaluation to perform:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`id_col` |
Name of ID column to aggregate predictions by. N.B. Current methods assume that the target class/value is constant within the IDs. N.B. When aggregating by ID, some metrics may be disabled. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`id_method` |
Method to use when aggregating predictions by ID.
Either When ## meanThe average prediction (value or probability) is calculated per ID and evaluated. This method assumes that the target class/value is constant within the IDs. ## majorityThe most predicted class per ID is found and evaluated. In case of a tie,
the winning classes share the probability (e.g. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`apply_softmax` |
Whether to apply the softmax function to the
prediction columns when N.B. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`cutoff` |
Threshold for predicted classes. (Numeric) N.B. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`positive` |
Level from dependent variable to predict.
Either as character ( E.g. if we have the levels
Used when calculating confusion matrix metrics and creating The N.B. Only affects the evaluation metrics. N.B. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`metrics` |
E.g. You can enable/disable all metrics at once by including
The Also accepts the string | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`include_predictions` |
Whether to include the predictions
in the output as a nested | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`parallel` |
Whether to run evaluations in parallel,
when | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`models` |
Deprecated. |

Packages used:

**Binomial** and **Multinomial**:

`ROC`

and `AUC`

:

Binomial: `pROC::roc`

Multinomial: `pROC::multiclass.roc`

—————————————————————-

—————————————————————-

`tibble`

containing the following metrics by default:

Average ** RMSE**,

`MAE`

`NRMSE(IQR)`

`RRSE`

`RAE`

`RMSLE`

See the additional metrics (disabled by default) at
`?gaussian_metrics`

.

Also includes:

A nested `tibble`

with the **Predictions** and targets.

A nested **Process** information object with information
about the evaluation.

—————————————————————-

—————————————————————-

`tibble`

with the following evaluation metrics, based on a
`confusion matrix`

and a `ROC`

curve fitted to the predictions:

`Confusion Matrix`

:

** Balanced Accuracy**,

`Accuracy`

`F1`

`Sensitivity`

`Specificity`

`Positive Predictive Value`

`Negative Predictive Value`

`Kappa`

`Detection Rate`

`Detection Prevalence`

`Prevalence`

`MCC`

`ROC`

:

** AUC**,

`Lower CI`

`Upper CI`

Note, that the `ROC`

curve is only computed if `AUC`

is enabled. See `metrics`

.

Also includes:

A nested `tibble`

with the **predictions** and targets.

A `list`

of **ROC** curve objects (if computed).

A nested `tibble`

with the **confusion matrix**.
The `Pos_`

columns tells you whether a row is a
True Positive (`TP`

), True Negative (`TN`

),
False Positive (`FP`

), or False Negative (`FN`

),
depending on which level is the "`positive`

" class.
I.e. the level you wish to predict.

A nested **Process** information object with information
about the evaluation.

—————————————————————-

—————————————————————-

For each class, a *one-vs-all* binomial evaluation is performed. This creates
a **Class Level Results** `tibble`

containing the same metrics as the binomial results
described above (excluding `Accuracy`

, `MCC`

, `AUC`

, `Lower CI`

and `Upper CI`

),
along with a count of the class in the target column (** Support**).
These metrics are used to calculate the macro metrics. The nested class level results

`tibble`

is also included in the output `tibble`

,
and could be reported along with the macro and overall metrics.
The output `tibble`

contains the macro and overall metrics.
The metrics that share their name with the metrics in the nested
class level results `tibble`

are averages of those metrics
(note: does not remove `NA`

s before averaging).
In addition to these, it also includes the ** Overall Accuracy** and
the multiclass

`MCC`

Other available metrics (disabled by default, see `metrics`

):
** Accuracy**,

`AUC`

`Weighted Balanced Accuracy`

`Weighted Accuracy`

`Weighted F1`

`Weighted Sensitivity`

`Weighted Sensitivity`

`Weighted Specificity`

`Weighted Pos Pred Value`

`Weighted Neg Pred Value`

`Weighted Kappa`

`Weighted Detection Rate`

`Weighted Detection Prevalence`

`Weighted Prevalence`

Note that the "Weighted" average metrics are weighted by the `Support`

.

When having a large set of classes, consider keeping `AUC`

disabled.

Also includes:

A nested `tibble`

with the **Predictions** and targets.

A `list`

of **ROC** curve objects when `AUC`

is enabled.

A nested `tibble`

with the multiclass **Confusion Matrix**.

A nested **Process** information object with information
about the evaluation.

Besides the binomial evaluation metrics and the `Support`

,
the nested class level results `tibble`

also contains a
nested `tibble`

with the **Confusion Matrix** from the one-vs-all evaluation.
The `Pos_`

columns tells you whether a row is a
True Positive (`TP`

), True Negative (`TN`

),
False Positive (`FP`

), or False Negative (`FN`

),
depending on which level is the "positive" class. In our case, `1`

is the current class
and `0`

represents all the other classes together.

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Other evaluation functions:
`binomial_metrics()`

,
`confusion_matrix()`

,
`evaluate_residuals()`

,
`gaussian_metrics()`

,
`multinomial_metrics()`

```
# Attach packages
library(cvms)
library(dplyr)
# Load data
data <- participant.scores
# Fit models
gaussian_model <- lm(age ~ diagnosis, data = data)
binomial_model <- glm(diagnosis ~ score, data = data)
# Add predictions
data[["gaussian_predictions"]] <- predict(gaussian_model, data,
type = "response",
allow.new.levels = TRUE
)
data[["binomial_predictions"]] <- predict(binomial_model, data,
allow.new.levels = TRUE
)
# Gaussian evaluation
evaluate(
data = data, target_col = "age",
prediction_cols = "gaussian_predictions",
type = "gaussian"
)
# Binomial evaluation
evaluate(
data = data, target_col = "diagnosis",
prediction_cols = "binomial_predictions",
type = "binomial"
)
#
# Multinomial
#
# Create a tibble with predicted probabilities and targets
data_mc <- multiclass_probability_tibble(
num_classes = 3, num_observations = 45,
apply_softmax = TRUE, FUN = runif,
class_name = "class_",
add_targets = TRUE
)
class_names <- paste0("class_", 1:3)
# Multinomial evaluation
evaluate(
data = data_mc, target_col = "Target",
prediction_cols = class_names,
type = "multinomial"
)
#
# ID evaluation
#
# Gaussian ID evaluation
# Note that 'age' is the same for all observations
# of a participant
evaluate(
data = data, target_col = "age",
prediction_cols = "gaussian_predictions",
id_col = "participant",
type = "gaussian"
)
# Binomial ID evaluation
evaluate(
data = data, target_col = "diagnosis",
prediction_cols = "binomial_predictions",
id_col = "participant",
id_method = "mean", # alternatively: "majority"
type = "binomial"
)
# Multinomial ID evaluation
# Add IDs and new targets (must be constant within IDs)
data_mc[["Target"]] <- NULL
data_mc[["ID"]] <- rep(1:9, each = 5)
id_classes <- tibble::tibble(
"ID" = 1:9,
"Target" = sample(x = class_names, size = 9, replace = TRUE)
)
data_mc <- data_mc %>%
dplyr::left_join(id_classes, by = "ID")
# Perform ID evaluation
evaluate(
data = data_mc, target_col = "Target",
prediction_cols = class_names,
id_col = "ID",
id_method = "mean", # alternatively: "majority"
type = "multinomial"
)
#
# Training and evaluating a multinomial model with nnet
#
# Create a data frame with some predictors and a target column
class_names <- paste0("class_", 1:4)
data_for_nnet <- multiclass_probability_tibble(
num_classes = 3, # Here, number of predictors
num_observations = 30,
apply_softmax = FALSE,
FUN = rnorm,
class_name = "predictor_"
) %>%
dplyr::mutate(Target = sample(
class_names,
size = 30,
replace = TRUE
))
# Train multinomial model using the nnet package
mn_model <- nnet::multinom(
"Target ~ predictor_1 + predictor_2 + predictor_3",
data = data_for_nnet
)
# Predict the targets in the dataset
# (we would usually use a test set instead)
predictions <- predict(
mn_model,
data_for_nnet,
type = "probs"
) %>%
dplyr::as_tibble()
# Add the targets
predictions[["Target"]] <- data_for_nnet[["Target"]]
# Evaluate predictions
evaluate(
data = predictions,
target_col = "Target",
prediction_cols = class_names,
type = "multinomial"
)
```

