ICPCA {cellWise}R Documentation

Iterative Classical PCA

Description

This function carries out classical PCA when the data may contain missing values, by an iterative algorithm. It is based on a Matlab function from the Missing Data Imputation Toolbox v1.0 by A. Folch-Fortuny, F. Arteaga and A. Ferrer.

Usage

ICPCA(X, k, scale = FALSE, maxiter = 20, tol = 0.005,
      tolProb = 0.99, distprob = 0.99) 

Arguments

X

the input data, which must be a matrix or a data frame. It may contain NA's. It must always be provided.

k

the desired number of principal components

scale

a value indicating whether and how the original variables should be scaled. If scale=FALSE (default) or scale=NULL no scaling is performed (and a vector of 1s is returned in the $scaleX slot). If scale=TRUE the variables are scaled to have a standard deviation of 1. Alternatively scale can be a function like mad, or a vector of length equal to the number of columns of x. The resulting scale estimates are returned in the $scaleX slot of the output.

maxiter

maximum number of iterations. Default is 20.

tol

tolerance for iterations. Default is 0.005.

tolProb

tolerance probability for residuals. Defaults to 0.99.

distprob

probability determining the cutoff values for orthogonal and score distances. Default is 0.99.

Value

A list with components:

scaleX

the scales of the columns of X.

k

the number of principal components.

loadings

the columns are the k loading vectors.

eigenvalues

the k eigenvalues.

center

vector with the fitted center.

covmatrix

estimated covariance matrix.

It

number of iteration steps.

diff

convergence criterion.

X.NAimp

data with all NA's imputed.

scores

scores of X.NAimp.

OD

orthogonal distances of the rows of X.NAimp.

cutoffOD

cutoff value for the OD.

SD

score distances of the rows of X.NAimp.

cutoffSD

cutoff value for the SD.

highOD

row numbers of cases whose OD is above cutoffOD.

highSD

row numbers of cases whose SD is above cutoffSD.

residScale

scale of the residuals.

stdResid

standardized residuals. Note that these are NA for all missing values of X.

indcells

indices of cellwise outliers.

Author(s)

Wannes Van Den Bossche

References

Folch-Fortuny, A., Arteaga, F., Ferrer, A. (2016). Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems, 154, 93-100.

Examples

library(MASS) 
set.seed(12345) 
n <- 100; d <- 10
A <- diag(d) * 0.1 + 0.9
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 100, FALSE)] <- NA
ICPCA.out <- ICPCA(x, k = 2)
plot(ICPCA.out$scores)

[Package cellWise version 2.5.3 Index]