R: Cross-Validated Generalized Dantzig Selector

cv_gds {hdme}

R Documentation

Cross-Validated Generalized Dantzig Selector

Description

Generalized Dantzig Selector with cross-validation.

Usage

cv_gds(
  X,
  y,
  family = "gaussian",
  no_lambda = 10,
  lambda = NULL,
  n_folds = 5,
  weights = rep(1, length(y))
)

Arguments

`X`	Design matrix.
`y`	Vector of the continuous response value.
`family`	Use "gaussian" for linear regression, "binomial" for logistic regression and "poisson" for Poisson regression.
`no_lambda`	Length of the vector `lambda` of regularization parameters. Note that if `lambda` is not provided, the actual number of values might differ slightly, due to the algorithm used by `glmnet::glmnet` in finding a grid of `lambda` values.
`lambda`	Regularization parameter. If not supplied and if `no_lambda > 1`, a sequence of `no_lambda` regularization parameters is computed with `glmnet::glmnet`. If `no_lambda = 1` then the cross-validated optimum for the lasso is computed using `glmnet::cv.glmnet`.
`n_folds`	Number of cross-validation folds to use.
`weights`	A vector of weights for each row of `X`. Defaults to 1 per observation.

Details

Cross-validation loss is calculated as the deviance of the model divided by the number of observations. For the Gaussian case, this is the mean squared error. Weights supplied through the weights argument are used both in fitting the models and when evaluating the test set deviance.

Value

An object of class cv_gds.

References

Candes E, Tao T (2007). “The Dantzig selector: Statistical estimation when p is much larger than n.” Ann. Statist., 35(6), 2313–2351.

James GM, Radchenko P (2009). “A generalized Dantzig selector with shrinkage tuning.” Biometrika, 96(2), 323-337.

Examples

## Not run: 
# Example with logistic regression
n <- 1000  # Number of samples
p <- 10 # Number of covariates
X <- matrix(rnorm(n * p), nrow = n) # True (latent) variables # Design matrix
beta <- c(seq(from = 0.1, to = 1, length.out = 5), rep(0, p-5)) # True regression coefficients
y <- rbinom(n, 1, (1 + exp(-X %*% beta))^(-1)) # Binomially distributed response
cv_fit <- cv_gds(X, y, family = "binomial", no_lambda = 50, n_folds = 10)
print(cv_fit)
plot(cv_fit)

# Now fit a single GDS at the optimum lambda value determined by cross-validation
fit <- gds(X, y, lambda = cv_fit$lambda_min, family = "binomial")
plot(fit)

# Compare this to the fit for which lambda is selected by GDS
# This automatic selection is performed by glmnet::cv.glmnet, for
# the sake of speed
fit2 <- gds(X, y, family = "binomial")

The following plot compares the two fits.
library(ggplot2)
library(tidyr)
df <- data.frame(fit = fit$beta, fit2 = fit2$beta, index = seq(1, p, by = 1))
ggplot(gather(df, key = "Model", value = "Coefficient", -index),
       aes(x = index, y = Coefficient, color = Model)) +
       geom_point() +
       theme(legend.title = element_blank())


## End(Not run)

[Package hdme version 0.6.0 Index]