R: Fits a regularization path of Sparse Discriminant Analysis...

msda {TULIP}

R Documentation

Fits a regularization path of Sparse Discriminant Analysis and predicts

Description

Fits a regularization path of Sparse Discriminant Analysis at a sequence of regularization parameters lambda. Performs prediction when testing data is provided. The msda function solves classification problem by fitting a sparse discriminant analysis model. When covariates are provided, the function will first make adjustment on the training data. It provides three models: binary for fitting DSDA model to solve binary classification problems, multi.original and multi.modified for fitting MSDA model to solve multi-class classification problems. multi.original runs faster for small dimension case but the computation ability is limited to a relatively large dimension. multi.modified has no such limitation and works in ultra-high dimensions. User can specify method by argument or use the default settings.

Usage

msda(x, z=NULL, y, testx=NULL,testz=NULL, model = NULL, lambda = NULL, 
 standardize=FALSE, alpha=1, nlambda = 100, 
 lambda.factor = ifelse((nobs - nclass)<= nvars, 0.2, 1e-03), dfmax = nobs, 
 pmax = min(dfmax * 2 + 20, nvars), pf = rep(1, nvars), eps = 1e-04, 
 maxit = 1e+06, sml = 1e-06, verbose = FALSE, perturb = NULL)

Arguments

`x`	Input matrix of predictors. `x` is of dimension `N \times p`; each row is an observation vector.
`z`	Input covariate matrix of dimension `N \times q`, where `q<N`. `z` can be omitted if covariate is absent.
`y`	Class labl. This argument should be a factor for classification. For `model`=`'binary'`, `y` should be a binary variable with values 1 and 2. For `model`=`'multi.original'` or `'multi.modified'`, `y` should be a multi-class variable starting from 1.
`testx`	Input testing matrix. Each row is a test case. When `testx` is not provided, the function will only fit the model and return the classifier. When `testx` is provided, the function will predict response on `testx` as well.
`testz`	Input testing covariate matrix. Can be omitted if covariate is absent. However, training covariates `z` and testing covariates `testz` must be provided or not at the same time.
`model`	Method type. The `model` argument can be one of `'binary'`, `'multi.original'`, `'multi.modified'` and the default is NULL. The function supports fitting DSDA and MSDA models by specifying method type. Without specification, the function will automatically choose one of the methods. If the response variable is binary, the function will fit a DSDA model. If the response variable is multi-class, the function will fit an original MSDA model for dimension `p<=2000` and a modified MSDA model for dimension `p>2000`.
`lambda`	A user supplied `lambda` sequence. Typically, by leaving this option unspecified users can have the program compute its own `lambda` sequence based on `nlambda` and `lambda.factor`. Supplying a value of `lambda` overrides this. It is better to supply a decreasing sequence of `lambda` values than a single (small) value, if not, the program will sort user-defined `lambda` sequence in decreasing order automatically.
`standardize`	A logic object indicating whether x should be standardized before performing DSDA. Default is FALSE. This argument is only valid for `model = 'binary'`.
`alpha`	The elasticnet mixing parameter, the same as in glmnet. Default is alpha=1 so that the lasso penalty is used in DSDA. This argument is only valid for `model = 'binary'`.
`nlambda`	The number of tuning values in sequence `lambda`. If users do not specify `lambda` values, the package will generate a solution path containing `nlambda` many tuning values of `lambda`. Default is 100 for `model = 'multi.original'` and 50 for `model = 'multi.modified'`.
`lambda.factor`	The factor for getting the minimal lambda in `lambda` sequence, where `min(lambda)` = `lambda.factor` * `max(lambda)`. `max(lambda)` is the smallest value of `lambda` for which all coefficients are zero. The default depends on `p` (the number of predictors) and its relationship with `N` (the number of rows in the matrix of predictors). For Original MSDA, if `N > p`, the default is `0.0001`, close to zero. If `N<p`, the default is `0.2`. For Modified MSDA, if `p\le 5000`, the default is `0.2`. If `5000<p\le 30000`, the default is `0.4`. If `p>30000`, the default is `0.5`. A very small value of `lambda.factor` will lead to a saturated fit. It takes no effect if there is user-defined `lambda` sequence. This argument is only valid for `multi.original` and `multi.modified`.
`dfmax`	The maximum number of selected variables in the model. Default is the number of observations `N`. This argument is only valid for `multi.original` and `multi.modified`.
`pmax`	The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most `pmax` variables and then shrink part of them such that the nubmer of final selected variables is less than `dfmax`. Default is `\min(dfmax\times 2+20, N)`.
`pf`	L1 penalty factor of length `p`. Separate L1 penalty weights can be applied to each coefficient of `\theta` to allow differential L1 shrinkage. Can be 0 for some variables, which implies no L1 shrinkage, and results in that variable always being included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in `exclude`). This argument is only valid for `multi.original` and `multi.modified`.
`eps`	Convergence threshold for coordinate descent. Each inner coordinate descent loop continues until the relative change in any coefficient. Defaults value is `1e-4`.
`maxit`	Maximum number of outer-loop iterations allowed at fixed lambda value. Default is 1e6. If models do not converge, consider increasing `maxit`. This argument is only valid for `multi.original` and `multi.modified`.
`sml`	Threshold for ratio of loss function change after each iteration to old loss function value. Default is `1e-06`. This argument is only valid for `multi.original` and `multi.modified`.
`verbose`	Whether to print out computation progress. The default is `FALSE`. This argument is only valid for `multi.original` and `multi.modified`.
`perturb`	A scalar number. If it is specified, the number will be added to each diagonal element of the covariance matrix as perturbation. The default is `NULL`. This argument is only valid for `multi.original` and `multi.modified`.

Details

The msda function fits a linear discriminant analysis model for vector X as follows:

\mathbf{X}|Y=k\sim N(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}).

The categorical response is predicted from the Bayes rule:

\widehat{Y}=\arg\max_{k=1,\cdots,K}{(\mathbf{X}-\frac{\boldsymbol{\mu}_k}{2})^T\boldsymbol{\beta}_k+\log\pi_k}.

The parameter model specifies which method to use in estimating \boldsymbol{\beta}. Users can use binary for binary problems and binary and multi.modified for multi-class problems. In multi.original, the algorithm first computes and stores \boldsymbol{\Sigma}, while it doesn't compute or store the entire covariance matrix in multi.modified. Since the algorithm is element-wise based, multi.modified computes each element of covariance matrix when needed. Therefore, multi.original is faster for low dimension but multi.modified can fit model for a much higher dimension case.

Note that for computing speed reason, if models are not converging or running slow, consider increasing eps and sml, or decreasing nlambda, or increasing lambda.factor before increasing maxit. Users can also reduce dfmax to limit the maximum number of variables in the model.

The arguments list out all parameters in the three models, but not all of them are necessary in applying one of the methods. See the specific explaination of each argument for more detail. Meanwhile, the output of DSDA model only includes beta and lambda.

Value

An object with S3 class dsda or msda.original and msda.modified.

`beta`	Output variable coefficients for each `lambda`, which is the estimation of `\boldsymbol{\beta}` in the Bayes rule. `beta` is a list of length being the number of `lambda`s. Each element of `beta` is a matrix of dimension `nvars\times (nclass-1)`. For `model = 'dsda'`, `beta` is a vector of length `nvars+1`, where the first element is intercept.
`df`	The number of nonzero coefficients for each value of `lambda`.
`obj`	The fitted value of the objective function for each value of `lambda`.
`dim`	Dimension of each coefficient matrix.
`lambda`	The actual `lambda` sequence used. The user specified sequence or automatically generated sequence could be truncated by constraints on `dfmax` and `pmax`.
`x`	The input matrix of predictors for training.
`y`	Class label in training data.
`npasses`	Total number of iterations (the most inner loop) summed over all lambda values
`jerr`	Error flag, for warnings and errors, 0 if no error.
`sigma`	Estimated sigma matrix. This argument is only available in object `msda.original`.
`delta`	Estimated delta matrix. delta[k] = mu[k]-mu[1].
`mu`	Estimated mu vector.
`prior`	Prior probability that y belong to class k, estimated by mean(y that belong to k).
`call`	The call that produced this object
`pred`	Predicted categorical response for each value in sequence `lambda` when `testx` is provided.

Author(s)

Yuqing Pan, Qing Mai, Xin Zhang

References

Mai, Q., Zou, H. and Yuan, M. (2012), "A direct approach to sparse discriminant analysis in ultra-high dimensions." Biometrica, 99, 29-42.

Mai, Q., Yang, Y., and Zou, H. (2017), "Multiclass sparse discriminant analysis." Statistica Sinica, in press.

URL: https://github.com/emeryyi/msda

Examples

data(GDS1615)
x<-GDS1615$x
y<-GDS1615$y
obj <- msda(x = x, y = y)

[Package TULIP version 1.0.2 Index]