MacroPCA {cellWise}  R Documentation 
MacroPCA
Description
This function performs the MacroPCA algorithm, which can deal with Missing values and Cellwise
and Rowwise Outliers. Note that this function first calls checkDataSet
and analyzes the remaining cleaned data.
Usage
MacroPCA(X, k = 0, MacroPCApars = NULL)
Arguments
X 
X is the input data, and must be an n by d matrix or a data frame.

k 
k is the desired number of principal components.
If k = 0 or k = NULL , the algorithm will compute the percentage
of explained variability for k upto kmax and show a scree plot,
and suggest to choose a value of k such that the cumulative percentage of
explained variability is at least 80 %.

MacroPCApars 
A list of available options detailed below. If MacroPCApars = NULL the defaults below are used.
DDCpars A list with parameters for the first step of the MacroPCA
algorithm (for the complete list see the function
DDC ). Default is NULL .
kmax The maximal number of principal components to compute. Default
is kmax = 10 . If k is provided kmax does not need to be specified,
unless k is larger than 10 in which case you need to set kmax
high enough.
alpha This is the coverage, i.e. the fraction of rows the algorithm
should give full weight. Alpha should be between 0.50 and 1, the default is
0.50.
scale A value indicating whether and how the original variables should
be scaled. If scale = FALSE or scale = NULL no scaling is
performed (and a vector of 1s is returned in the $scaleX slot ).
If scale = TRUE (default) the data are scaled by a 1step Mestimator of scale with the Tukey biweight weight function to have a robust scale of 1.
Alternatively scale can be a vector of length
equal to the number of columns of x . The resulting scale estimates are
returned in the $scaleX slot of the MacroPCA output.
maxdir The maximal number of random directions to use for computing the
outlyingness of the data points. Default is maxdir = 250 . If the number
n of observations is small all n * (n  1) / 2 pairs of
observations are used.
distprob The quantile determining the cutoff values
for orthogonal and score distances. Default is 0.99.

silent
If TRUE, statements tracking the algorithm's progress will not be printed. Defaults to FALSE .
maxiter Maximum number of iterations. Default is 20.
tol Tolerance for iterations. Default is 0.005.
bigOutput whether to compute and return NAimp, Cellimp and Fullimp. Defaults to TRUE .

Value
A list with components:
MacroPCApars 
the options used in the call.

remX 
Cleaned data after checkDataSet .

DDC 
results of the first step of MacroPCA. These are needed to run
MacroPCApredict on new data.

scaleX 
the scales of the columns of X .

k 
the number of principal components.

loadings 
the columns are the k loading vectors.

eigenvalues 
the k eigenvalues.

center 
vector with the fitted center.

alpha 
alpha from the input.

h 
h (computed from alpha ).

It 
number of iteration steps.

diff 
convergence criterion.

X.NAimp 
data with all NA 's imputed by MacroPCA .

scores 
scores of X.NAimp .

OD 
orthogonal distances of the rows of X.NAimp .

cutoffOD 
cutoff value for the OD.

SD 
score distances of the rows of X.NAimp .

cutoffSD 
cutoff value for the SD.

indrows 
row numbers of rowwise outliers.

residScale 
scale of the residuals.

stdResid 
standardized residuals. Note that these are NA
for all missing values of X .

indcells 
indices of cellwise outliers.

NAimp 
various results for the NAimputed data.

Cellimp 
various results for the cellimputed data.

Fullimp 
various result for the fully imputed data.

Author(s)
Rousseeuw P.J., Van den Bossche W.
References
Hubert, M., Rousseeuw, P.J., Van den Bossche W. (2019). MacroPCA: An allinone PCA method allowing for missing values as well as cellwise and rowwise outliers. Technometrics, 61(4), 459473. (link to open access pdf)
See Also
checkDataSet
, cellMap
,
DDC
Examples
library(MASS)
set.seed(12345)
n < 50; d < 10
A < matrix(0.9, d, d); diag(A) = 1
x < mvrnorm(n, rep(0,d), A)
x[sample(1:(n * d), 50, FALSE)] < NA
x[sample(1:(n * d), 50, FALSE)] < 10
x < cbind(1:n, x)
MacroPCA.out < MacroPCA(x, 2)
cellMap(MacroPCA.out$remX, MacroPCA.out$stdResid,
columnlabels = 1:d, rowlabels = 1:n)
# For more examples, we refer to the vignette:
vignette("MacroPCA_examples")
[Package
cellWise version 2.2.5
Index]