simulation-tools {robreg3S} | R Documentation |
Data generator for simulation study on cell- and case-wise contamination
Description
Includes the data generator for the simulation study on cell- and case-wise contamination that appears on Leung et al. (2015).
Usage
generate.randbeta(p)
generate.cellcontam.regress(n, p, A, sigma, b, k, cp)
generate.casecontam.regress(n, p, A, sigma, b, l, k, cp)
generate.cellcontam.regress.dummies(n, p, pd, probd, A, sigma, b, k, cp)
generate.casecontam.regress.dummies(n, p, pd, probd, A, sigma, b, l, k, cp)
Arguments
n |
integer indicating the number of observations to be generated. |
p |
integer indicating the number of continuous variables to be generated. |
pd |
integer indicating the number of dummy variables to be generated. |
probd |
vector of quantiles of length |
A |
a correlation matrix. See also |
sigma |
residual standard deviation. |
b |
vector of regression coefficients. |
k |
size of cellwise outliers and vertical outliers. See Leung et al. for details. |
l |
size of leverage outliers. See Leung et al. for details. |
cp |
proportion of cell- or case-wise contamination. Maximum of 10% for cellwise and 50% for casewise. |
Value
A list with components:
x |
multivariate normal sample with cell- or case-wise contamination. |
y |
vector of responses. |
dummies |
vector of dummies. |
Author(s)
Andy Leung andy.leung@stat.ubc.ca, Hongyang Zhang, Ruben H. Zamar
References
Leung, A. , Zamar, R.H., and Zhang, H. Robust regression estimation and inference in the presence of cellwise and casewise contamination. arXiv:1509.02564.
See Also
Examples
##################################################
## Cellwise contaminated data simulation
## (continuous covariates only)
set.seed(10)
b <- 10*generate.randbeta(p=15)
A <- generate.randcorr(cond=100, p=15)
dat <- generate.cellcontam.regress(n=300, p=15, A=A, sigma=0.5, b=b, k=10, cp=0.05)
## LS
fit.LS <- lm( y ~ x, dat)
mean((coef(fit.LS)[-1] - b)^2)
## MM regression
fit.MM <- robustbase::lmrob( y ~ x, dat)
mean((coef(fit.MM)[-1] - b)^2)
## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)
##################################################
## Casewise contaminated data simulation
## (continuous covariates only)
set.seed(10)
b <- 10*generate.randbeta(p=10)
A <- generate.randcorr(cond=100, p=10)
dat <- generate.casecontam.regress(n=200, p=10, A=A, sigma=0.5, b=b, l=8, k=10, cp=0.10)
## LS
fit.LS <- lm( y ~ x, dat)
mean((coef(fit.LS)[-1] - b)^2)
## MM regression
fit.MM <- robustbase::lmrob( y ~ x, dat)
mean((coef(fit.MM)[-1] - b)^2)
## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)
## Not run:
##################################################
## Cellwise contaminated data simulation
## (continuous and dummies covariates)
set.seed(10)
b <- 10*generate.randbeta(p=15)
A <- generate.randcorr(cond=100, p=15)
dat <- generate.cellcontam.regress.dummies(n=300, p=12, pd=3,
probd=c(1/2,1/3,1/4), A=A, sigma=0.5, b=b, k=10, cp=0.05)
## LS
fit.LS <- lm( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.LS)[-1] - b)^2)
## MM regression
fit.MM <- robustbase::lmrob( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.MM)[-1] - b)^2)
## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, dummies=dat$dummies, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)
##################################################
## Casewise contaminated data simulation
## (continuous and dummies covariates)
set.seed(10)
b <- 10*generate.randbeta(p=15)
A <- generate.randcorr(cond=100, p=15)
dat <- generate.casecontam.regress.dummies(n=300, p=12, pd=3,
probd=c(1/2,1/3,1/4), A=A, sigma=0.5, b=b, l=7, k=10, cp=0.10)
## LS
fit.LS <- lm( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.LS)[-1] - b)^2)
## MM regression
fit.MM <- robustbase::lmrob( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.MM)[-1] - b)^2)
## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, dummies=dat$dummies, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)
## End(Not run)