R: Iterative Classical PCA

ICPCA {cellWise}

R Documentation

Iterative Classical PCA

Description

This function carries out classical PCA when the data may contain missing values, by an iterative algorithm. It is based on a Matlab function from the Missing Data Imputation Toolbox v1.0 by A. Folch-Fortuny, F. Arteaga and A. Ferrer.

Usage

ICPCA(X, k, scale = FALSE, maxiter = 20, tol = 0.005,
      tolProb = 0.99, distprob = 0.99)

Arguments

`X`	the input data, which must be a matrix or a data frame. It may contain NA's. It must always be provided.
`k`	the desired number of principal components
`scale`	a value indicating whether and how the original variables should be scaled. If `scale=FALSE` (default) or `scale=NULL` no scaling is performed (and a vector of 1s is returned in the `$scaleX` slot). If `scale=TRUE` the variables are scaled to have a standard deviation of 1. Alternatively scale can be a function like mad, or a vector of length equal to the number of columns of x. The resulting scale estimates are returned in the `$scaleX` slot of the output.
`maxiter`	maximum number of iterations. Default is 20.
`tol`	tolerance for iterations. Default is 0.005.
`tolProb`	tolerance probability for residuals. Defaults to 0.99.
`distprob`	probability determining the cutoff values for orthogonal and score distances. Default is 0.99.

Value

A list with components:

`scaleX`	the scales of the columns of X.
`k`	the number of principal components.
`loadings`	the columns are the k loading vectors.
`eigenvalues`	the k eigenvalues.
`center`	vector with the fitted center.
`covmatrix`	estimated covariance matrix.
`It`	number of iteration steps.
`diff`	convergence criterion.
`X.NAimp`	data with all NA's imputed.
`scores`	scores of X.NAimp.
`OD`	orthogonal distances of the rows of X.NAimp.
`cutoffOD`	cutoff value for the OD.
`SD`	score distances of the rows of X.NAimp.
`cutoffSD`	cutoff value for the SD.
`highOD`	row numbers of cases whose `OD` is above `cutoffOD`.
`highSD`	row numbers of cases whose `SD` is above `cutoffSD`.
`residScale`	scale of the residuals.
`stdResid`	standardized residuals. Note that these are NA for all missing values of `X`.
`indcells`	indices of cellwise outliers.

Author(s)

Wannes Van Den Bossche

References

Folch-Fortuny, A., Arteaga, F., Ferrer, A. (2016). Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems, 154, 93-100.

Examples

library(MASS) 
set.seed(12345) 
n <- 100; d <- 10
A <- diag(d) * 0.1 + 0.9
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 100, FALSE)] <- NA
ICPCA.out <- ICPCA(x, k = 2)
plot(ICPCA.out$scores)

[Package cellWise version 2.5.3 Index]