R: Classification for MGSDA

classifyV {MGSDA}

R Documentation

Classification for MGSDA

Description

Classify observations in the test set using the supplied matrix of canonical vectors V and the training set.

Usage

classifyV(Xtrain, Ytrain, Xtest, V, prior = TRUE, tol1 = 1e-10)

Arguments

`Xtrain`	A Nxp data matrix; N observations on the rows and p features on the columns.
`Ytrain`	A N vector containing the group labels. Should be coded as 1,2,...,G, where G is the number of groups.
`Xtest`	A Mxp data matrix; M test observations on the rows and p features on the columns.
`V`	A pxr matrix of canonical vectors that is used to classify observations.
`prior`	A logical indicating whether to put larger weights to the groups of larger size; the default value is TRUE.
`tol1`	Tolerance level for the eigenvalues of `V^tWV`. If some eigenvalues are less than `tol`, the low-rank version of `V` is used for classification.

Details

For a new observation with the value x, the classification is performed based on the smallest Mahalanobis distance in the projected space:

\min_{1\le g \le G}(V^tx-Z_g)(V^tWV)^{-1}(V^tx-Z_g)

where Z_g are the group-specific means of the training dataset in the projected space and W is the sample within-group covariance matrix.

If prior=T, then the above distance is adjusted by -2\log\frac{n_g}{N}, where n_g is the size of group g.

Value

Returns a vector of length M with predicted group labels for the test set.

Author(s)

Irina Gaynanova

References

I.Gaynanova, J.Booth and M.Wells (2016) "Simultaneous Sparse Estimation of Canonical Vectors in the p>>N setting.", JASA, 111(514), 696-706.

Examples

### Example 1
# generate training data
n=10
p=100
G=3
ytrain=rep(1:G,each=n)
set.seed(1)
xtrain=matrix(rnorm(p*n*G),n*G,p)
# find V
V=dLDA(xtrain,ytrain,lambda=0.1)
sum(rowSums(V)!=0)
# generate test data
m=20
set.seed(3)
xtest=matrix(rnorm(p*m),m,p)
# perform classification
ytest=classifyV(xtrain,ytrain,xtest,V)

[Package MGSDA version 1.6.1 Index]