ICPCA {cellWise} | R Documentation |
Iterative Classical PCA
Description
This function carries out classical PCA when the data may contain missing values, by an iterative algorithm. It is based on a Matlab function from the Missing Data Imputation Toolbox v1.0 by A. Folch-Fortuny, F. Arteaga and A. Ferrer.
Usage
ICPCA(X, k, scale = FALSE, maxiter = 20, tol = 0.005,
tolProb = 0.99, distprob = 0.99)
Arguments
X |
the input data, which must be a matrix or a data frame. It may contain NA's. It must always be provided. |
k |
the desired number of principal components |
scale |
a value indicating whether and how the original
variables should be scaled. If |
maxiter |
maximum number of iterations. Default is 20. |
tol |
tolerance for iterations. Default is 0.005. |
tolProb |
tolerance probability for residuals. Defaults to 0.99. |
distprob |
probability determining the cutoff values for orthogonal and score distances. Default is 0.99. |
Value
A list with components:
scaleX |
the scales of the columns of X. |
k |
the number of principal components. |
loadings |
the columns are the k loading vectors. |
eigenvalues |
the k eigenvalues. |
center |
vector with the fitted center. |
covmatrix |
estimated covariance matrix. |
It |
number of iteration steps. |
diff |
convergence criterion. |
X.NAimp |
data with all NA's imputed. |
scores |
scores of X.NAimp. |
OD |
orthogonal distances of the rows of X.NAimp. |
cutoffOD |
cutoff value for the OD. |
SD |
score distances of the rows of X.NAimp. |
cutoffSD |
cutoff value for the SD. |
highOD |
row numbers of cases whose |
highSD |
row numbers of cases whose |
residScale |
scale of the residuals. |
stdResid |
standardized residuals. Note that these are NA
for all missing values of |
indcells |
indices of cellwise outliers. |
Author(s)
Wannes Van Den Bossche
References
Folch-Fortuny, A., Arteaga, F., Ferrer, A. (2016). Missing Data Imputation Toolbox for MATLAB. Chemometrics and Intelligent Laboratory Systems, 154, 93-100.
Examples
library(MASS)
set.seed(12345)
n <- 100; d <- 10
A <- diag(d) * 0.1 + 0.9
x <- mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 100, FALSE)] <- NA
ICPCA.out <- ICPCA(x, k = 2)
plot(ICPCA.out$scores)