most_challenging {cvms} | R Documentation |

Finds the data points that, overall, were the most challenging to predict, based on a prediction metric.

```
most_challenging(
data,
type,
obs_id_col = "Observation",
target_col = "Target",
prediction_cols = ifelse(type == "gaussian", "Prediction", "Predicted Class"),
threshold = 0.15,
threshold_is = "percentage",
metric = NULL,
cutoff = 0.5
)
```

`data` |
Predictions can be passed as values, predicted classes or predicted probabilities:
## MultinomialWhen ## Probabilities (Preferable)One column per class with the probability of that class. The columns should have the name of their class, as they are named in the target column. E.g.:
## ClassesA single column of type
## BinomialWhen ## Probabilities (Preferable)One column with the
Note: At the alphabetical ordering of the class labels, they are of type ## ClassesA single column of type
## GaussianWhen
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`type` |
Type of task used to get the predictions:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`obs_id_col` |
Name of column with observation IDs. This will be used to aggregate the performance of each observation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`target_col` |
Name of column with the true classes/values in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`prediction_cols` |
Name(s) of column(s) with the predictions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`threshold` |
Threshold to filter observations by. Depends on The ## Gaussian## threshold_is "percentage"(Approximate) percentage of the observations with the largest root mean square errors to return. ## threshold_is "score"Observations with a root mean square error larger than or equal to the ## Binomial, Multinomial## threshold_is "percentage"(Approximate) percentage of the observations to return with:
## threshold_is "score"
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`threshold_is` |
Either | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`metric` |
The metric to use. If ## Binomial, Multinomial
When When ## GaussianIgnored. Always uses | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

`cutoff` |
Threshold for predicted classes. (Numeric) N.B. |

`data.frame`

with the most challenging observations and their metrics.

``>=` / `<=``

denotes the threshold as score.

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

```
# Attach packages
library(cvms)
library(dplyr)
##
## Multinomial
##
# Find the most challenging data points (per classifier)
# in the predicted.musicians dataset
# which resembles the "Predictions" tibble from the evaluation results
# Passing predicted probabilities
# Observations with 30% highest MAE scores
most_challenging(
predicted.musicians,
obs_id_col = "ID",
prediction_cols = c("A", "B", "C", "D"),
type = "multinomial",
threshold = 0.30
)
# Observations with 25% highest Cross Entropy scores
most_challenging(
predicted.musicians,
obs_id_col = "ID",
prediction_cols = c("A", "B", "C", "D"),
type = "multinomial",
threshold = 0.25,
metric = "Cross Entropy"
)
# Passing predicted classes
# Observations with 30% lowest Accuracy scores
most_challenging(
predicted.musicians,
obs_id_col = "ID",
prediction_cols = "Predicted Class",
type = "multinomial",
threshold = 0.30
)
# The 40% lowest-scoring on accuracy per classifier
predicted.musicians %>%
dplyr::group_by(Classifier) %>%
most_challenging(
obs_id_col = "ID",
prediction_cols = "Predicted Class",
type = "multinomial",
threshold = 0.40
)
# Accuracy scores below 0.05
most_challenging(
predicted.musicians,
obs_id_col = "ID",
type = "multinomial",
threshold = 0.05,
threshold_is = "score"
)
##
## Binomial
##
# Subset the predicted.musicians
binom_data <- predicted.musicians %>%
dplyr::filter(Target %in% c("A","B")) %>%
dplyr::rename(Prediction = B)
# Passing probabilities
# Observations with 30% highest MAE
most_challenging(
binom_data,
obs_id_col = "ID",
type = "binomial",
prediction_cols = "Prediction",
threshold = 0.30
)
# Observations with 30% highest Cross Entropy
most_challenging(
binom_data,
obs_id_col = "ID",
type = "binomial",
prediction_cols = "Prediction",
threshold = 0.30,
metric = "Cross Entropy"
)
# Passing predicted classes
# Observations with 30% lowest Accuracy scores
most_challenging(
binom_data,
obs_id_col = "ID",
type = "binomial",
prediction_cols = "Predicted Class",
threshold = 0.30
)
##
## Gaussian
##
set.seed(1)
df <- data.frame(
"Observation" = rep(1:10, n = 3),
"Target" = rnorm(n = 30, mean = 25, sd = 5),
"Prediction" = rnorm(n = 30, mean = 27, sd = 7)
)
# The 20% highest RMSE scores
most_challenging(
df,
type = "gaussian",
threshold = 0.2
)
# RMSE scores above 9
most_challenging(
df,
type = "gaussian",
threshold = 9,
threshold_is = "score"
)
```

[Package *cvms* version 1.3.3 Index]