R: fit the sparse DWD

sdwd {sdwd}

R Documentation

fit the sparse DWD

Description

Fits the sparse distance weighted discrimination (SDWD) model with imposing L1, elastic-net, or adaptive elastic-net penalties. The solution path is computed at a grid of values of tuning parameter lambda. This function is modified based on the glmnet and the gcdnet packages.

Usage

sdwd(x, y, nlambda=100, 
     lambda.factor=ifelse(nobs < nvars, 0.01, 1e-04), 
     lambda=NULL, lambda2=0, pf=rep(1, nvars), 
     pf2=rep(1, nvars), exclude, dfmax=nvars + 1, 
     pmax=min(dfmax * 1.2, nvars), standardize=TRUE, 
     eps=1e-8, maxit=1e6, strong=TRUE)

Arguments

`x`	A matrix with `N` rows and `p` columns for predictors.
`y`	A vector of length `p` for binary responses. The element of `y` is either -1 or 1.
`nlambda`	The number of `lambda` values, i.e., length of the `lambda` sequence. Default is 100.
`lambda.factor`	The ratio of the smallest to the largest `lambda` in the sequence: `lambda.factor` = `min(lambda)` / `max(lambda)`. `max(lambda)` is the least `lambda` to make all coefficients to be zero. The default value of `lambda.factor` is 0.0001 if `N >= p` or 0.01 if `N < p`. Takes no effect when user specifies a `lambda` sequence.
`lambda`	An optional user-supplied `lambda` sequence. If `lambda = NULL` (default), the program computes its own `lambda` sequence based on `nlambda` and `lambda.factor`; otherwise, the program uses the user-specified one. Since the program will automatically sort user-defined `lambda` sequence in decreasing order, it is better to supply a decreasing sequence.
`lambda2`	The L2 tuning parameter `\lambda_2`.
`pf`	A vector of length `p` representing the L1 penalty weights to each coefficient of `\beta` for adaptive L1 or adaptive elastic net. `pf` can be 0 for some predictor(s), leading to including the predictor(s) all the time. One suggested choice of `pf` is `{(\beta + 1/n)}^{-1}`, where `n` is the sample size and `\beta` is the coefficents obtained by L1 DWD or enet DWD. Default is 1 for all predictors (and infinity if some predictors are listed in `exclude`).
`pf2`	A vector of length `p` for L2 penalty factor for adaptive L1 or adaptive elastic net. To allow different L2 shrinkage, user can set `pf2` to be different L2 penalty weights for each coefficient of `\beta`. `pf2` can be 0 for some variables, indicating no L2 shrinkage. Default is 1 for all predictors.
`exclude`	Whether to exclude some predictors from the model. This is equivalent to adopting an infinite penalty factor when excluding some predictor. Default is none.
`dfmax`	Restricts at most how many predictors can be incorporated in the model. Default is `p+1`. This restriction is helpful when `p` is large, provided that a partial path is acceptable.
`pmax`	Restricts the maximum number of variables ever to be nonzero; e.g, once some `\beta` enters the model, it counts once. The count will not change when the `\beta` exits or re-enters the model. Default is `min(dfmax*1.2,p)`.
`standardize`	Whether to standardize the data. If `TRUE`, `sdwd` normalizes the predictors such that each column has sum squares`\sum^N_{i=1}x_{ij}^2/N=1` of one. Note that x is always centered (i.e. `\sum^N_{i=1}x_{ij}=0`) no matter `standardize` is `TRUE` or `FALSE`. `sdwd` always returns coefficient `beta` on the original scale. Default value is `TRUE`.
`eps`	The algorithm stops when (i.e. `4\max_j(\beta_j^{new}-\beta_j^{old})^2` is less than `eps`, where `j=0,\ldots, p`. Defaults value is `1e-8`.
`maxit`	Restricts how many outer-loop iterations are allowed. Default is 1e6. Consider increasing `maxit` when the algorithm does not converge.
`strong`	If `TRUE`, adopts the strong rule to accelerate the algorithm.

Details

The sdwd minimizes the sparse penalized DWD loss function,

L(y, X, \beta)/N + \lambda_1||\beta||_1 + 0.5\lambda_2||\beta||_2^2,

where L(u)=1-u if u \le 1/2, 1/(4u) if u > 1/2 is the DWD loss. The value of lambda2 is user-specified.

To use the L1 penalty (lasso), set lambda2=0. To use the elastic net, set lambda2 as nonzero. To use the adaptive L1, set lambda2=0 and specify pf and pf2. To use the adaptive elastic net, set lambda2 as nonzero and specify pf and pf2 as well.

When the algorithm do not converge or run slow, consider increasing eps, decreasing nlambda, or increasing lambda.factor before increasing maxit.

Value

An object with S3 class sdwd.

`b0`	A vector of length `length(lambda)` representing the intercept at each `lambda` value.
`beta`	A matrix of dimension `p*length(lambda)` representing the coefficients at each `lambda` value. The matrix is stored as a sparse matrix (`Matrix` package). To convert it into normal type matrix use `as.matrix()`.
`df`	The number of nonzero coefficients at each `lambda`.
`dim`	The dimension of coefficient matrix, i.e., `p*length(lambda)`.
`lambda`	The `lambda` sequence that was actually used.
`npasses`	Total number of iterations for all lambda values.
`jerr`	Warnings and errors; 0 if no error.
`call`	The call that produced this object.

Author(s)

Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang boxiang-wang@uiowa.edu

References

Wang, B. and Zou, H. (2016) “Sparse Distance Weighted Discrimination", Journal of Computational and Graphical Statistics, 25(3), 826–838.
https://www.tandfonline.com/doi/full/10.1080/10618600.2015.1049700

Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33(1), 1–22.
https://www.jstatsoft.org/v33/i01/paper

Marron, J.S., Todd, M.J., and Ahn, J. (2007) “Distance-Weighted Discrimination", Journal of the American Statistical Association, 102(408), 1267–1271.
https://www.tandfonline.com/doi/abs/10.1198/016214507000001120

Tibshirani, Robert., Bien, J., Friedman, J.,Hastie, T.,Simon, N.,Taylor, J., and Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in Lasso-type Problems, Journal of the Royal Statistical Society, Series B, 74(2), 245–266.
https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2011.01004.x

Yang, Y. and Zou, H. (2013) “An Efficient Algorithm for Computing the HHSVM and Its Generalizations", Journal of Computational and Graphical Statistics, 22(2), 396–415.
https://www.tandfonline.com/doi/full/10.1080/10618600.2012.680324

Examples

# load the data
data(colon)
# fit the elastic-net penalized DWD with lambda2=1
fit = sdwd(colon$x, colon$y, lambda2=1)
print(fit)
# coefficients at some lambda value
c1 = coef(fit, s=0.005)
# make predictions
predict(fit, newx=colon$x[1:10, ], s=c(0.01, 0.005))

[Package sdwd version 1.0.5 Index]