Incremental {sharp} | R Documentation |
Incremental prediction performance in regression
Description
Computes the prediction performance of regression models where predictors are
sequentially added by order of decreasing selection proportion. This function
can be used to evaluate the marginal contribution of each of the selected
predictors over and above more stable predictors. Performances are evaluated
as in ExplanatoryPerformance
.
Usage
Incremental(
xdata,
ydata,
new_xdata = NULL,
new_ydata = NULL,
stability = NULL,
family = NULL,
implementation = NULL,
prediction = NULL,
resampling = "subsampling",
n_predictors = NULL,
K = 100,
tau = 0.8,
seed = 1,
n_thr = NULL,
time = 1000,
verbose = TRUE,
...
)
Arguments
xdata |
matrix of predictors with observations as rows and variables as columns. |
ydata |
optional vector or matrix of outcome(s). If |
new_xdata |
optional test set (predictor data). |
new_ydata |
optional test set (outcome data). |
stability |
output of |
family |
type of regression model. Possible values include
|
implementation |
optional function to refit the model. If
|
prediction |
optional function to compute predicted values from the
model refitted with |
resampling |
resampling approach to create the training set. The default
is |
n_predictors |
number of predictors to consider. |
K |
number of training-test splits. Only used if |
tau |
proportion of observations used in the training set. Only used if
|
seed |
value of the seed to ensure reproducibility of the results. Only
used if |
n_thr |
number of thresholds to use to construct the ROC curve. If
|
time |
numeric indicating the time for which the survival probabilities are computed. Only applicable to Cox regression. |
verbose |
logical indicating if a loading bar and messages should be printed. |
... |
additional parameters passed to the function provided in
|
Value
An object of class incremental
.
For logistic regression, a list with:
FPR |
A list with, for each of the models (sequentially added predictors), the False Positive Rates for different thresholds (columns) and different data splits (rows). |
TPR |
A list with, for each of the models (sequentially added predictors), the True Positive Rates for different thresholds (columns) and different data splits (rows). |
AUC |
A list with, for each of the models (sequentially added predictors), a vector of Area Under the Curve (AUC) values obtained with different data splits. |
Beta |
Estimated regression coefficients from visited models. |
names |
Names of the predictors by order of inclusion. |
stable |
Binary vector indicating
which predictors are stably selected. Only returned if |
For Cox regression, a list with:
concordance |
A list with, for each of the models (sequentially added predictors), a vector of concordance indices obtained with different data splits. |
Beta |
Estimated regression coefficients from visited models. |
names |
Names of the predictors by order of inclusion. |
stable |
Binary vector indicating
which predictors are stably selected. Only returned if |
For linear regression, a list with:
Q_squared |
A list with, for each of the models (sequentially added predictors), a vector of Q-squared obtained with different data splits. |
Beta |
Estimated regression coefficients from visited models. |
names |
Names of the predictors by order of inclusion. |
stable |
Binary vector indicating which
predictors are stably selected. Only returned if |
See Also
Other prediction performance functions:
ExplanatoryPerformance()
Examples
# Data simulation
set.seed(1)
simul <- SimulateRegression(
n = 1000, pk = 20,
family = "binomial", ev_xy = 0.8
)
# Data split: selection, training and test set
ids <- Split(
data = simul$ydata,
family = "binomial",
tau = c(0.4, 0.3, 0.3)
)
xselect <- simul$xdata[ids[[1]], ]
yselect <- simul$ydata[ids[[1]], ]
xtrain <- simul$xdata[ids[[2]], ]
ytrain <- simul$ydata[ids[[2]], ]
xtest <- simul$xdata[ids[[3]], ]
ytest <- simul$ydata[ids[[3]], ]
# Stability selection
stab <- VariableSelection(
xdata = xselect,
ydata = yselect,
family = "binomial"
)
# Performances in test set of model refitted in training set
incr <- Incremental(
xdata = xtrain, ydata = ytrain,
new_xdata = xtest, new_ydata = ytest,
stability = stab, n_predictors = 10
)
plot(incr)
# Alternative with multiple training/test splits
incr <- Incremental(
xdata = rbind(xtrain, xtest),
ydata = c(ytrain, ytest),
stability = stab, K = 10, n_predictors = 10
)
plot(incr)