R: Accelerated Sparse Discriminant Analysis

ASDA {accSDA}

R Documentation

Accelerated Sparse Discriminant Analysis

Description

Applies accelerated proximal gradient algorithm, proximal gradient algorithm or alternating direction methods of multipliers algorithm to the optimal scoring formulation of sparse discriminant analysis proposed by Clemmensen et al. 2011.

argmin{|(Y_t\theta-X_t\beta)|_2^2 + t|\beta|_1 + \lambda|\beta|_2^2}

Usage

ASDA(Xt, ...)

## Default S3 method:
ASDA(
  Xt,
  Yt,
  Om = diag(p),
  gam = 0.001,
  lam = 1e-06,
  q = K - 1,
  method = "SDAAP",
  control = list(),
  ...
)

Arguments

`Xt`	n by p data matrix, (can also be a data.frame that can be coerced to a matrix)
`...`	Additional arguments for `lda` function in package MASS.
`Yt`	n by K matrix of indicator variables (Yij = 1 if i in class j). This will later be changed to handle factor variables as well. Each observation belongs in a single class, so for a given row/observation, only one element is 1 and the rest is 0.
`Om`	p by p parameter matrix Omega in generalized elastic net penalty.
`gam`	Regularization parameter for elastic net penalty.
`lam`	Regularization parameter for l1 penalty, must be greater than zero. If cross-validation is used (`CV = TRUE`) then this must be a vector of length greater than one.
`q`	Desired number of discriminant vectors.
`method`	This parameter selects which optimization method to use. It is specified as a character vector which can be one of the three values `SDAP` Proximal gradient algorithm. `SDAAP` Accelerated proximal gradient algorithm. `SDAD` Alternating directions method of multipliers algorithm. Note that further parameters are passed to the function in the argument `control`, which is a list with named components.
`control`	List of control arguments. See Details.

Details

The control list contains the following entries to further tune the algorithms.

PGsteps: Maximum number if inner proximal gradient/ADMM algorithm for finding beta. Default value is 1000.
PGtol: Stopping tolerance for inner method. If the method is SDAD, then this must be a vector of two values, absolute (first element) and relative tolerance (second element). Default value is 1e-5 for both absolute and relative tolerances.
maxits: Number of iterations to run. Default value is 250.
tol: Stopping tolerance. Default value is 1e-3.
mu: Penalty parameter for augmented Lagrangian term, must be greater than zero and only needs to be specified when using method SDAD. Default value is 1.
CV: Logical value which is TRUE if cross validation is supposed to be performed. If cross-validation is performed, then lam should be specified as a vector containing the regularization values to be tested. Default value is FALSE.
folds: Integer determining the number of folds in cross-validation. Not needed if CV is not specified. Default value is 5.
feat: Maximum fraction of nonzero features desired in validation scheme. Not needed if CV is not specified. Default value is 0.15.
quiet: Set to FALSE if status updates are supposed to be printed to the R console. Default value is TRUE. Note that this triggers a lot of printing to the console.
ordinal: Set to TRUE if the labels are ordinal. Only available for methods SDAAP and SDAD.
initTheta: Option to set the initial theta vector, by default it is a vector of all ones for the first theta.
bt: Logical indicating whether backtracking should be used, only applies to the Proximal Gradient based methods. By default, backtracking is not used.
L: Initial estimate for Lipshitz constant used for backtracking. Default value is 0.25.
eta: Scalar for Lipshitz constant. Default value is 1.25.
rankRed: Boolean indicating whether Om is factorized, such that R^t*R=Om, currently only applicable for accelerated proximal gradient.

Value

ASDA returns an object of class "ASDA" including a list with the following named components:

call: The matched call.
B: p by q matrix of discriminant vectors, i.e. sparse loadings.
Q: K by q matrix of scoring vectors, i.e. optimal scores.
varNames: Names of the predictors used, i.e. column names of Xt.
origP: Number of variables in Xt.
fit: Output from function lda on projected data. This is NULL the trivial solution is found, i.e. B is all zeroes. Use lower values of lam if that is the case.
classes: The classes in Yt.
lambda: The lambda/lam used, best value found by cross- validation if CV is TRUE.

NULL

Note

The input matrix Xt should be normalized, i.e. each column corresponding to a variable should have its mean subtracted and scaled to unit length. The functions normalize and normalizetest are supplied for this purpose in the package.

Examples

    set.seed(123)
    # Prepare training and test set
    train <- c(1:40,51:90,101:140)
    Xtrain <- iris[train,1:4]
    nX <- normalize(Xtrain)
    Xtrain <- nX$Xc
    Ytrain <- iris[train,5]
    Xtest <- iris[-train,1:4]
    Xtest <- normalizetest(Xtest,nX)
    Ytest <- iris[-train,5]

    # Define parameters for Alternating Direction Method of Multipliers (SDAD)
    Om <- diag(4)+0.1*matrix(1,4,4) #elNet coef mat
    gam <- 0.0001
    lam <- 0.0001
    method <- "SDAD"
    q <- 2
    control <- list(PGsteps = 100,
                    PGtol = c(1e-5,1e-5),
                    mu = 1,
                    maxits = 100,
                    tol = 1e-3,
                    quiet = FALSE)

    # Run the algorithm
    res <- ASDA(Xt = Xtrain,
                Yt = Ytrain,
                Om = Om,
                gam = gam ,
                lam = lam,
                q = q,
                method = method,
                control = control)

    # Can also just use the defaults, which is Accelerated Proximal Gradient (SDAAP):
    resDef <- ASDA(Xtrain,Ytrain)

    # Some example on simulated data
    # Generate Gaussian data on three classes with plenty of redundant variables

    # This example shows the basic steps on how to apply this to data, i.e.:
    #  1) Setup training data
    #  2) Normalize
    #  3) Train
    #  4) Predict
    #  5) Plot projected data
    #  6) Accuracy on test set

    P <- 300 # Number of variables
    N <- 50 # Number of samples per class

    # Mean for classes, they are zero everywhere except the first 3 coordinates
    m1 <- rep(0,P)
    m1[1] <- 3

    m2 <- rep(0,P)
    m2[2] <- 3

    m3 <- rep(0,P)
    m3[3] <- 3

    # Sample dummy data
    Xtrain <- rbind(MASS::mvrnorm(n=N,mu = m1, Sigma = diag(P)),
                   MASS::mvrnorm(n=N,mu = m2, Sigma = diag(P)),
                   MASS::mvrnorm(n=N,mu = m3, Sigma = diag(P)))

    Xtest <- rbind(MASS::mvrnorm(n=N,mu = m1, Sigma = diag(P)),
                   MASS::mvrnorm(n=N,mu = m2, Sigma = diag(P)),
                   MASS::mvrnorm(n=N,mu = m3, Sigma = diag(P)))

    # Generate the labels
    Ytrain <- factor(rep(1:3,each=N))
    Ytest <- Ytrain

    # Normalize the data
    Xt <- accSDA::normalize(Xtrain)
    Xtrain <- Xt$Xc # Use the centered and scaled data
    Xtest <- accSDA::normalizetest(Xtest,Xt)

    # Train the classifier and increase the sparsity parameter from the default
    # so we penalize more for non-sparse solutions.
    res <- accSDA::ASDA(Xtrain,Ytrain,lam=0.01)

    # Plot the projected training data, it is projected to
    # 2-dimension because we have 3 classes. The number of discriminant
    # vectors is maximum number of classes minus 1.
    XtrainProjected <- Xtrain%*%res$beta

    plot(XtrainProjected[,1],XtrainProjected[,2],col=Ytrain)

    # Predict on the test data
    preds <- predict(res, newdata = Xtest)

    # Plot projected test data with predicted and correct labels
    XtestProjected <- Xtest%*%res$beta

    plot(XtestProjected[,1],XtestProjected[,2],col=Ytest,
         main="Projected test data with original labels")
    plot(XtestProjected[,1],XtestProjected[,2],col=preds$class,
         main="Projected test data with predicted labels")

    # Calculate accuracy
    sum(preds$class == Ytest)/(3*N) # We have N samples per class, so total 3*N

[Package accSDA version 1.1.3 Index]