CSimca {rrcovHD} | R Documentation |
Classification in high dimensions based on the (classical) SIMCA method
Description
CSimca performs the (classical) SIMCA method. This method classifies a data matrix x with a known group structure. To reduce the dimension on each group a PCA analysis is performed. Afterwards a classification rule is developped to determine the assignment of new observations.
Usage
CSimca(x, ...)
## Default S3 method:
CSimca(x, grouping, prior=proportions, k, kmax = ncol(x),
tol = 1.0e-4, trace=FALSE, ...)
## S3 method for class 'formula'
CSimca(formula, data = NULL, ..., subset, na.action)
Arguments
formula |
a formula of the form |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
x |
a matrix or data frame containing the explanatory variables (training set). |
grouping |
grouping variable: a factor specifying the class for each observation. |
prior |
prior probabilities, default to the class proportions for the training set. |
tol |
tolerance |
k |
number of principal components to compute. If |
kmax |
maximal number of principal components to compute.
Default is |
trace |
whether to print intermediate results. Default is |
... |
arguments passed to or from other methods. |
Details
CSimca
, serving as a constructor for objects of class CSimca-class
is a generic function with "formula" and "default" methods.
SIMCA is a two phase procedure consisting of PCA performed on each group separately for dimension reduction followed by classification rules built in the lower dimensional space (note that the dimension in each group can be different). In original SIMCA new observations are classified by means of their deviations from the different PCA models. Here (and also in the robust versions implemented in this package) the classification rules will be obtained using two popular distances arising from PCA - orthogonal distances (OD) and score distances (SD). For the definition of these distances, the definition of the cutoff values and the standartization of the distances see Vanden Branden K, Hubert M (2005) and Todorov and Filzmoser (2009).
Value
An S4 object of class CSimca-class
which is a subclass of of the
virtual class Simca-class
.
Author(s)
Valentin Todorov valentin.todorov@chello.at
References
Vanden Branden K, Hubert M (2005) Robust classification in high dimensions based on the SIMCA method. Chemometrics and Intellegent Laboratory Systems 79:10–21
Todorov V & Filzmoser P (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47, doi:10.18637/jss.v032.i03.
Todorov V & Filzmoser P (2014), Software Tools for Robust Analysis of High-Dimensional Data. Austrian Journal of Statistics, 43(4), 255–266, doi:10.17713/ajs.v43i4.44.
Examples
data(pottery)
dim(pottery) # 27 observations in 2 classes, 6 variables
head(pottery)
## Build the SIMCA model. Use RSimca for a robust version
cs <- CSimca(origin~., data=pottery)
cs
summary(cs)
## generate a sample from the pottery data set -
## this will be the "new" data to be predicted
smpl <- sample(1:nrow(pottery), 5)
test <- pottery[smpl, -7] # extract the test sample. Remove the last (grouping) variable
print(test)
## predict new data
pr <- predict(cs, newdata=test)
pr@classification