cv_gds {hdme} | R Documentation |
Cross-Validated Generalized Dantzig Selector
Description
Generalized Dantzig Selector with cross-validation.
Usage
cv_gds(
X,
y,
family = "gaussian",
no_lambda = 10,
lambda = NULL,
n_folds = 5,
weights = rep(1, length(y))
)
Arguments
X |
Design matrix. |
y |
Vector of the continuous response value. |
family |
Use "gaussian" for linear regression, "binomial" for logistic regression and "poisson" for Poisson regression. |
no_lambda |
Length of the vector |
lambda |
Regularization parameter. If not supplied and if
|
n_folds |
Number of cross-validation folds to use. |
weights |
A vector of weights for each row of |
Details
Cross-validation loss is calculated as the deviance of the model divided
by the number of observations.
For the Gaussian case, this is the mean squared error. Weights supplied
through the weights
argument are used both in fitting the models
and when evaluating the test set deviance.
Value
An object of class cv_gds
.
References
Candes E, Tao T (2007). “The Dantzig selector: Statistical estimation when p is much larger than n.” Ann. Statist., 35(6), 2313–2351.
James GM, Radchenko P (2009). “A generalized Dantzig selector with shrinkage tuning.” Biometrika, 96(2), 323-337.
Examples
## Not run:
# Example with logistic regression
n <- 1000 # Number of samples
p <- 10 # Number of covariates
X <- matrix(rnorm(n * p), nrow = n) # True (latent) variables # Design matrix
beta <- c(seq(from = 0.1, to = 1, length.out = 5), rep(0, p-5)) # True regression coefficients
y <- rbinom(n, 1, (1 + exp(-X %*% beta))^(-1)) # Binomially distributed response
cv_fit <- cv_gds(X, y, family = "binomial", no_lambda = 50, n_folds = 10)
print(cv_fit)
plot(cv_fit)
# Now fit a single GDS at the optimum lambda value determined by cross-validation
fit <- gds(X, y, lambda = cv_fit$lambda_min, family = "binomial")
plot(fit)
# Compare this to the fit for which lambda is selected by GDS
# This automatic selection is performed by glmnet::cv.glmnet, for
# the sake of speed
fit2 <- gds(X, y, family = "binomial")
The following plot compares the two fits.
library(ggplot2)
library(tidyr)
df <- data.frame(fit = fit$beta, fit2 = fit2$beta, index = seq(1, p, by = 1))
ggplot(gather(df, key = "Model", value = "Coefficient", -index),
aes(x = index, y = Coefficient, color = Model)) +
geom_point() +
theme(legend.title = element_blank())
## End(Not run)