msda {TULIP}R Documentation

Fits a regularization path of Sparse Discriminant Analysis and predicts

Description

Fits a regularization path of Sparse Discriminant Analysis at a sequence of regularization parameters lambda. Performs prediction when testing data is provided. The msda function solves classification problem by fitting a sparse discriminant analysis model. When covariates are provided, the function will first make adjustment on the training data. It provides three models: binary for fitting DSDA model to solve binary classification problems, multi.original and multi.modified for fitting MSDA model to solve multi-class classification problems. multi.original runs faster for small dimension case but the computation ability is limited to a relatively large dimension. multi.modified has no such limitation and works in ultra-high dimensions. User can specify method by argument or use the default settings.

Usage

msda(x, z=NULL, y, testx=NULL,testz=NULL, model = NULL, lambda = NULL, 
 standardize=FALSE, alpha=1, nlambda = 100, 
 lambda.factor = ifelse((nobs - nclass)<= nvars, 0.2, 1e-03), dfmax = nobs, 
 pmax = min(dfmax * 2 + 20, nvars), pf = rep(1, nvars), eps = 1e-04, 
 maxit = 1e+06, sml = 1e-06, verbose = FALSE, perturb = NULL)

Arguments

x

Input matrix of predictors. x is of dimension N \times p; each row is an observation vector.

z

Input covariate matrix of dimension N \times q, where q<N. z can be omitted if covariate is absent.

y

Class labl. This argument should be a factor for classification. For model='binary', y should be a binary variable with values 1 and 2. For model='multi.original' or 'multi.modified', y should be a multi-class variable starting from 1.

testx

Input testing matrix. Each row is a test case. When testx is not provided, the function will only fit the model and return the classifier. When testx is provided, the function will predict response on testx as well.

testz

Input testing covariate matrix. Can be omitted if covariate is absent. However, training covariates z and testing covariates testz must be provided or not at the same time.

model

Method type. The model argument can be one of 'binary', 'multi.original', 'multi.modified' and the default is NULL. The function supports fitting DSDA and MSDA models by specifying method type. Without specification, the function will automatically choose one of the methods. If the response variable is binary, the function will fit a DSDA model. If the response variable is multi-class, the function will fit an original MSDA model for dimension p<=2000 and a modified MSDA model for dimension p>2000.

lambda

A user supplied lambda sequence. Typically, by leaving this option unspecified users can have the program compute its own lambda sequence based on nlambda and lambda.factor. Supplying a value of lambda overrides this. It is better to supply a decreasing sequence of lambda values than a single (small) value, if not, the program will sort user-defined lambda sequence in decreasing order automatically.

standardize

A logic object indicating whether x should be standardized before performing DSDA. Default is FALSE. This argument is only valid for model = 'binary'.

alpha

The elasticnet mixing parameter, the same as in glmnet. Default is alpha=1 so that the lasso penalty is used in DSDA. This argument is only valid for model = 'binary'.

nlambda

The number of tuning values in sequence lambda. If users do not specify lambda values, the package will generate a solution path containing nlambda many tuning values of lambda. Default is 100 for model = 'multi.original' and 50 for model = 'multi.modified'.

lambda.factor

The factor for getting the minimal lambda in lambda sequence, where min(lambda) = lambda.factor * max(lambda). max(lambda) is the smallest value of lambda for which all coefficients are zero. The default depends on p (the number of predictors) and its relationship with N (the number of rows in the matrix of predictors). For Original MSDA, if N > p, the default is 0.0001, close to zero. If N<p, the default is 0.2. For Modified MSDA, if p\le 5000, the default is 0.2. If 5000<p\le 30000, the default is 0.4. If p>30000, the default is 0.5. A very small value of lambda.factor will lead to a saturated fit. It takes no effect if there is user-defined lambda sequence. This argument is only valid for multi.original and multi.modified.

dfmax

The maximum number of selected variables in the model. Default is the number of observations N. This argument is only valid for multi.original and multi.modified.

pmax

The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most pmax variables and then shrink part of them such that the nubmer of final selected variables is less than dfmax. Default is \min(dfmax\times 2+20, N).

pf

L1 penalty factor of length p. Separate L1 penalty weights can be applied to each coefficient of \theta to allow differential L1 shrinkage. Can be 0 for some variables, which implies no L1 shrinkage, and results in that variable always being included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in exclude). This argument is only valid for multi.original and multi.modified.

eps

Convergence threshold for coordinate descent. Each inner coordinate descent loop continues until the relative change in any coefficient. Defaults value is 1e-4.

maxit

Maximum number of outer-loop iterations allowed at fixed lambda value. Default is 1e6. If models do not converge, consider increasing maxit. This argument is only valid for multi.original and multi.modified.

sml

Threshold for ratio of loss function change after each iteration to old loss function value. Default is 1e-06. This argument is only valid for multi.original and multi.modified.

verbose

Whether to print out computation progress. The default is FALSE. This argument is only valid for multi.original and multi.modified.

perturb

A scalar number. If it is specified, the number will be added to each diagonal element of the covariance matrix as perturbation. The default is NULL. This argument is only valid for multi.original and multi.modified.

Details

The msda function fits a linear discriminant analysis model for vector X as follows:

\mathbf{X}|Y=k\sim N(\boldsymbol{\mu}_k,\boldsymbol{\Sigma}).

The categorical response is predicted from the Bayes rule:

\widehat{Y}=\arg\max_{k=1,\cdots,K}{(\mathbf{X}-\frac{\boldsymbol{\mu}_k}{2})^T\boldsymbol{\beta}_k+\log\pi_k}.

The parameter model specifies which method to use in estimating \boldsymbol{\beta}. Users can use binary for binary problems and binary and multi.modified for multi-class problems. In multi.original, the algorithm first computes and stores \boldsymbol{\Sigma}, while it doesn't compute or store the entire covariance matrix in multi.modified. Since the algorithm is element-wise based, multi.modified computes each element of covariance matrix when needed. Therefore, multi.original is faster for low dimension but multi.modified can fit model for a much higher dimension case.

Note that for computing speed reason, if models are not converging or running slow, consider increasing eps and sml, or decreasing nlambda, or increasing lambda.factor before increasing maxit. Users can also reduce dfmax to limit the maximum number of variables in the model.

The arguments list out all parameters in the three models, but not all of them are necessary in applying one of the methods. See the specific explaination of each argument for more detail. Meanwhile, the output of DSDA model only includes beta and lambda.

Value

An object with S3 class dsda or msda.original and msda.modified.

beta

Output variable coefficients for each lambda, which is the estimation of \boldsymbol{\beta} in the Bayes rule. beta is a list of length being the number of lambdas. Each element of beta is a matrix of dimension nvars\times (nclass-1). For model = 'dsda', beta is a vector of length nvars+1, where the first element is intercept.

df

The number of nonzero coefficients for each value of lambda.

obj

The fitted value of the objective function for each value of lambda.

dim

Dimension of each coefficient matrix.

lambda

The actual lambda sequence used. The user specified sequence or automatically generated sequence could be truncated by constraints on dfmax and pmax.

x

The input matrix of predictors for training.

y

Class label in training data.

npasses

Total number of iterations (the most inner loop) summed over all lambda values

jerr

Error flag, for warnings and errors, 0 if no error.

sigma

Estimated sigma matrix. This argument is only available in object msda.original.

delta

Estimated delta matrix. delta[k] = mu[k]-mu[1].

mu

Estimated mu vector.

prior

Prior probability that y belong to class k, estimated by mean(y that belong to k).

call

The call that produced this object

pred

Predicted categorical response for each value in sequence lambda when testx is provided.

Author(s)

Yuqing Pan, Qing Mai, Xin Zhang

References

Mai, Q., Zou, H. and Yuan, M. (2012), "A direct approach to sparse discriminant analysis in ultra-high dimensions." Biometrica, 99, 29-42.

Mai, Q., Yang, Y., and Zou, H. (2017), "Multiclass sparse discriminant analysis." Statistica Sinica, in press.

URL: https://github.com/emeryyi/msda

See Also

cv.msda, predict.msda

Examples

data(GDS1615)
x<-GDS1615$x
y<-GDS1615$y
obj <- msda(x = x, y = y)

[Package TULIP version 1.0.2 Index]