generateData {cellWise}  R Documentation 
This function generates multivariate normal datasets with several possible types of outliers. It is used in several simulation studies. For a detailed description, see the referenced papers.
generateData(n, d, mu, Sigma, perout, gamma, outlierType = "casewise", seed = NULL)
n 
The number of observations 
d 
The dimension of the data. 
mu 
The center of the clean data. 
Sigma 
The covariance matrix of the clean data. Could be obtained from 
outlierType 
The type of contamination to be generated. Should be one of:

perout 
The percentage of generated outliers. For 
gamma 
How far outliers are from the center of the distribution. 
seed 
Seed used to generate the data. 
A list with components:
X
The generated data matrix of size n \times d.
indcells
A vector with the indices of the contaminated cells.
indrows
A vector with the indices of the rowwise outliers.
J. Raymaekers and P.J. Rousseeuw
C. Agostinelli, Leung, A., Yohai, V. J., and Zamar, R. H. (2015). Robust Estimation of Multivariate Location and Scatter in the Presence of Cellwise and Casewise Contamination. Test, 24, 441461.
Rousseeuw, P.J., Van den Bossche W. (2018). Detecting Deviating Data Cells. Technometrics, 60(2), 135145. (link to open access pdf)
J. Raymaekers and P.J. Rousseeuw (2020). Handling cellwise outliers by sparse regression and robust covariance. Arxiv: 1912.12446. (link to open access pdf)
n < 100 d < 5 mu < rep(0, d) Sigma < diag(d) perout < 0.1 gamma < 10 data < generateData(n, d, mu, Sigma, perout, gamma, outlierType = "cellwisePlain", seed = 1) pairs(data$X) data$indcells