generateData {cellWise} | R Documentation |
Generates artificial datasets with outliers
Description
This function generates multivariate normal datasets with several possible types of outliers. It is used in several simulation studies. For a detailed description, see the referenced papers.
Usage
generateData(n, d, mu, Sigma, perout, gamma,
outlierType = "casewise", seed = NULL)
Arguments
n |
The number of observations |
d |
The dimension of the data. |
mu |
The center of the clean data. |
Sigma |
The covariance matrix of the clean data. Could be obtained from |
outlierType |
The type of contamination to be generated. Should be one of:
|
perout |
The percentage of generated outliers. For |
gamma |
How far outliers are from the center of the distribution. |
seed |
Seed used to generate the data. |
Value
A list with components:
X
The generated data matrix of sizen \times d
.indcells
A vector with the indices of the contaminated cells.indrows
A vector with the indices of the rowwise outliers.
Author(s)
J. Raymaekers and P.J. Rousseeuw
References
C. Agostinelli, Leung, A., Yohai, V. J., and Zamar, R. H. (2015). Robust Estimation of Multivariate Location and Scatter in the Presence of Cellwise and Casewise Contamination. Test, 24, 441-461.
Rousseeuw, P.J., Van den Bossche W. (2018). Detecting Deviating Data Cells. Technometrics, 60(2), 135-145. (link to open access pdf)
J. Raymaekers and P.J. Rousseeuw (2020). Handling cellwise outliers by sparse regression and robust covariance. Arxiv: 1912.12446. (link to open access pdf)
See Also
Examples
n <- 100
d <- 5
mu <- rep(0, d)
Sigma <- diag(d)
perout <- 0.1
gamma <- 10
data <- generateData(n, d, mu, Sigma, perout, gamma, outlierType = "cellwisePlain", seed = 1)
pairs(data$X)
data$indcells