pcalda {mt}R Documentation

Classification with PCADA

Description

Classification with combination of principal component analysis (PCA) and linear discriminant analysis (LDA).

Usage

pcalda(x, ...)

## Default S3 method:
pcalda(x, y, center = TRUE, scale. = FALSE, ncomp = NULL,
       tune=FALSE,...)

## S3 method for class 'formula'
pcalda(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

x

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

y

A factor specifying the class for each observation if no formula principal argument is given.

center

A logical value indicating whether x should be shifted to zero centred by column-wise.

scale.

A logical value indicating whether x should be scaled to have unit variance by column-wise before the analysis takes place.

ncomp

The number of principal components to be used in the classification. If NULL and tune=TRUE, it is the row number of x minus the number of class indicating in y. If NULL and tune=FALSE, it is the half of row number of x.

tune

A logical value indicating whether the best number of components should be tuned.

...

Arguments passed to or from other methods.

subset

An index vector specifying the cases to be used in the training sample.

na.action

A function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found.

Details

A critical issue of applying linear discriminant analysis (LDA) is both the singularity and instability of the within-class scatter matrix. In practice, there are often a large number of features available, but the total number of training patterns is limited and commonly less than the dimension of the feature space. To tackle this issue, pcalda combines PCA and LDA for classification. It uses PCA for dimension reduction. The rotated data resulted from PCA will be the input variable to LDA for classification.

Value

An object of class pcalda containing the following components:

x

The rotated data on discriminant variables.

cl

The observed class labels of training data.

pred

The predicted class labels of training data.

posterior

The posterior probabilities for the predicted classes.

conf

The confusion matrix based on training data.

acc

The accuracy rate of training data.

ncomp

The number of principal components used for classification.

pca.out

The output of PCA.

lda.out

The output of LDA.

call

The (matched) function call.

Note

This function may be called giving either a formula and optional data frame, or a matrix and grouping factor as the first two arguments.

Author(s)

Wanchang Lin

See Also

predict.pcalda, plot.pcalda, tune.func

Examples

data(abr1)
cl   <- factor(abr1$fact$class)
dat  <- abr1$pos

## divide data as training and test data
idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace=FALSE) 

## construct train and test data 
train.dat  <- dat[idx,]
train.t    <- cl[idx]
test.dat   <- dat[-idx,]        
test.t     <- cl[-idx] 

## apply pcalda
model    <- pcalda(train.dat,train.t)
model
summary(model)

## plot
plot(model,dimen=c(1,2),main = "Training data",abbrev = TRUE)
plot(model,main = "Training data",abbrev = TRUE)

## confusion matrix
pred.te  <- predict(model, test.dat)$class
table(test.t,pred.te)


[Package mt version 2.0-1.20 Index]