test_arguments {testarguments} | R Documentation |
Test (multiple) arguments of a prediction algorithm
Description
Test the performance of a prediction algorithm over a range of argument values. Multiple arguments can be tested simultaneously.
Usage
test_arguments(pred_fun, df_train, df_test, diagnostic_fun, arguments)
Arguments
pred_fun |
The prediction algorithm to be tested.
It should be a function with formal arguments |
df_train |
training data |
df_test |
testing data |
diagnostic_fun |
the criteria with which the predictive performance will be assessed |
arguments |
named list of arguments and their values to check |
Details
For each combination of the supplied argument levels, the value of
pred_fun()
is combined with df_test
using cbind()
,
which is then passed into diagnostic_fun()
to compute the diagnostics.
Since the number of columns in the returned value of pred_fun()
is arbitrary,
one can test both predictions and uncertainty quantification of the predictions
(e.g., by including prediction standard errors or predictive interval bounds)
Value
an object of class 'testargs'
containing all information from the testing procedure
See Also
plot_diagnostics
, optimal_arguments
Examples
library("testarguments")
## Simulate training and testing data
RNGversion("3.6.0"); set.seed(1)
n <- 1000 # sample size
x <- seq(-1, 1, length.out = n) # covariates
mu <- exp(3 + 2 * x * (x - 1) * (x + 1) * (x - 2)) # polynomial function in x
Z <- rpois(n, mu) # simulate data
df <- data.frame(x = x, Z = Z, mu = mu)
train_id <- sample(1:n, n/2, replace = FALSE)
df_train <- df[train_id, ]
df_test <- df[-train_id, ]
## Algorithm that uses df_train to predict over df_test. We use glm(), and
## test the degree of the regression polynomial and the link function.
pred_fun <- function(df_train, df_test, degree, link) {
M <- glm(Z ~ poly(x, degree), data = df_train,
family = poisson(link = as.character(link)))
## Predict over df_test
pred <- as.data.frame(predict(M, df_test, type = "link", se.fit = TRUE))
## Compute response level predictions and 90% prediction interval
inv_link <- family(M)$linkinv
fit_Y <- pred$fit
se_Y <- pred$se.fit
pred <- data.frame(fit_Z = inv_link(fit_Y),
upr_Z = inv_link(fit_Y + 1.645 * se_Y),
lwr_Z = inv_link(fit_Y - 1.645 * se_Y))
return(pred)
}
## Define diagnostic function. Should return a named vector
diagnostic_fun <- function(df) {
with(df, c(
RMSE = sqrt(mean((Z - fit_Z)^2)),
MAE = mean(abs(Z - fit_Z)),
coverage = mean(lwr_Z < mu & mu < upr_Z)
))
}
## Compute the user-defined diagnostics over a range of argument levels
testargs_object <- test_arguments(
pred_fun, df_train, df_test, diagnostic_fun,
arguments = list(degree = 1:6, link = c("log", "sqrt"))
)
## Visualise the performance across all combinations of the supplied arguments
plot_diagnostics(testargs_object)
## Focus on a subset of the tested arguments
plot_diagnostics(testargs_object, focused_args = "degree")
## Compute the optimal arguments for each diagnostic
optimal_arguments(
testargs_object,
optimality_criterion = list(coverage = function(x) which.min(abs(x - 0.90)))
)