nested.cv {TANDEM} | R Documentation |
Estimating predictive performance via nested cross-validation
Description
Performs a nested cross-validation to assess the predictive performance. The inner loop is used to determine the optimal lambda (as in cv.glmnet) and the outer loop is used to asses the predictive performance in an unbiased way.
Usage
nested.cv(
x,
y,
upstream,
method = "tandem",
family = "gaussian",
nfolds = 10,
nfolds_inner = 10,
foldid = NULL,
lambda_upstream = "lambda.1se",
lambda_downstream = "lambda.1se",
lambda_glmnet = "lambda.1se",
...
)
Arguments
x |
A feature matrix, where the rows correspond to samples and the columns to features. |
y |
A vector containing the response. |
upstream |
A logical index vector that indicates for each feature whether it's upstream (TRUE) or downstream (FALSE). |
method |
Indicates whether the nested cross-validation is performed on TANDEM or on the classic approach (glmnet). Should be either "tandem" or "glmnet". |
family |
The family parameter that's passed to cv.glmnet(). Currently, only family='gaussian' is supported. |
nfolds |
Number of cross-validation folds (default is 10) used in the outer cross-validation loop. |
nfolds_inner |
Number of cross-validation folds (default is 10) used to determine the optimal lambda in the inner cross-validation loop. |
foldid |
An optional vector indicating in which cross-validation fold each sample should be in the outer cross-validation loop. Overrides nfolds when used. |
lambda_upstream |
Only used when method='tandem'. For the first stage (using the upstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se. |
lambda_downstream |
Only used when method='tandem'. For the second stage (using the downstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se. |
lambda_glmnet |
Only used when method='glmnet'. Should glmnet use lambda.min or lambda.1se? Default is lambda.1se. |
... |
Other parameters that are passed to cv.glmnet(). |
Value
The predicted response vector y_hat and the mean-squared error (MSE).
Examples
# unpack example data
x = example_data$x
y = example_data$y
upstream = example_data$upstream
# assess the prediction error in a nested cv-loop
# fix the seed to have the same foldids between the two methods
set.seed(1)
cv_tandem = nested.cv(x, y, upstream, method="tandem", alpha=0.5)
set.seed(1)
cv_glmnet = nested.cv(x, y, upstream, method="glmnet", alpha=0.5)
barplot(c(cv_tandem$mse, cv_glmnet$mse), ylab="MSE", names=c("TANDEM", "Classic approach"))