gds {hdme}R Documentation

Generalized Dantzig Selector

Description

Generalized Dantzig Selector

Usage

gds(X, y, lambda = NULL, family = "gaussian", weights = NULL)

Arguments

X

Design matrix.

y

Vector of the continuous response value.

lambda

Regularization parameter. Only a single value is supported.

family

Use "gaussian" for linear regression, "binomial" for logistic regression and "poisson" for Poisson regression.

weights

A vector of weights for each row of X.

Value

Intercept and coefficients at the values of lambda specified.

References

Candes E, Tao T (2007). “The Dantzig selector: Statistical estimation when p is much larger than n.” Ann. Statist., 35(6), 2313–2351.

James GM, Radchenko P (2009). “A generalized Dantzig selector with shrinkage tuning.” Biometrika, 96(2), 323-337.

Examples

# Example with logistic regression
n <- 1000  # Number of samples
p <- 10 # Number of covariates
X <- matrix(rnorm(n * p), nrow = n) # True (latent) variables # Design matrix
beta <- c(seq(from = 0.1, to = 1, length.out = 5), rep(0, p-5)) # True regression coefficients
y <- rbinom(n, 1, (1 + exp(-X %*% beta))^(-1)) # Binomially distributed response
fit <- gds(X, y, family = "binomial")
print(fit)
plot(fit)
coef(fit)

# Try with more penalization
fit <- gds(X, y, family = "binomial", lambda = 0.1)
coef(fit)
coef(fit, all = TRUE)


# Case weighting
# Assume we wish to put more emphasis on predicting the positive cases correctly
# In this case we give the 1s three times the weight of the zeros.
weights <- (y == 0) * 1 + (y == 1) * 3
fit_w <- gds(X, y, family = "binomial", weights = weights, lambda = 0.1)

# Next we test this on a new dataset, generated with the same parameters
X_new <- matrix(rnorm(n * p), nrow = n)
y_new <- rbinom(n, 1, (1 + exp(-X_new %*% beta))^(-1))
# We use a 50 % threshold as classification rule
# Unweighted classifcation
classification <- ((1 + exp(- fit$intercept - X_new %*% fit$beta))^(-1) > 0.5) * 1
# Weighted classification
classification_w <- ((1 + exp(- fit_w$intercept - X_new %*% fit_w$beta))^(-1) > 0.5) * 1

# As expected, the weighted classification predicts many more 1s than 0s, since
# these are heavily up-weighted
table(classification, classification_w)

# Here we compare the performance of the weighted and unweighted models.
# The weighted model gets most of the 1s right, while the unweighted model
# gets the highest overall performance.
table(classification, y_new)
table(classification_w, y_new)


[Package hdme version 0.6.0 Index]