addMissing {SimDesign} | R Documentation |
Add missing values to a vector given a MCAR, MAR, or MNAR scheme
Description
Given an input vector, replace elements of this vector with missing values according to some scheme.
Default method replaces input values with a MCAR scheme (where on average 10% of the values will be
replaced with NA
s). MAR and MNAR are supported by replacing the default FUN
argument.
Usage
addMissing(y, fun = function(y, rate = 0.1, ...) rep(rate, length(y)), ...)
Arguments
y |
an input vector that should contain missing data in the form of |
fun |
a user defined function indicating the missing data mechanism for each element in |
... |
additional arguments to be passed to |
Details
Given an input vector y, and other relevant variables inside (X) and outside (Z) the data-set, the three types of missingness are:
- MCAR
Missing completely at random (MCAR). This is realized by randomly sampling the values of the input vector (y) irrespective of the possible values in X and Z. Therefore missing values are randomly sampled and do not depend on any data characteristics and are truly random
- MAR
Missing at random (MAR). This is realized when values in the dataset (X) predict the missing data mechanism in y; conceptually this is equivalent to
P(y = NA | X)
. This requires the user to define a custom missing data function- MNAR
Missing not at random (MNAR). This is similar to MAR except that the missing mechanism comes from the value of y itself or from variables outside the working dataset; conceptually this is equivalent to
P(y = NA | X, Z, y)
. This requires the user to define a custom missing data function
Value
the input vector y
with the sampled NA
values
(according to the FUN
scheme)
Author(s)
Phil Chalmers rphilip.chalmers@gmail.com
References
Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations
with the SimDesign Package. The Quantitative Methods for Psychology, 16
(4), 248-280.
doi:10.20982/tqmp.16.4.p248
Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte
Carlo simulation. Journal of Statistics Education, 24
(3), 136-156.
doi:10.1080/10691898.2016.1246953
Examples
## Not run:
set.seed(1)
y <- rnorm(1000)
## 10% missing rate with default FUN
head(ymiss <- addMissing(y), 10)
## 50% missing with default FUN
head(ymiss <- addMissing(y, rate = .5), 10)
## missing values only when female and low
X <- data.frame(group = sample(c('male', 'female'), 1000, replace=TRUE),
level = sample(c('high', 'low'), 1000, replace=TRUE))
head(X)
fun <- function(y, X, ...){
p <- rep(0, length(y))
p[X$group == 'female' & X$level == 'low'] <- .2
p
}
ymiss <- addMissing(y, X, fun=fun)
tail(cbind(ymiss, X), 10)
## missingness as a function of elements in X (i.e., a type of MAR)
fun <- function(y, X){
# missingness with a logistic regression approach
df <- data.frame(y, X)
mm <- model.matrix(y ~ group + level, df)
cfs <- c(-5, 2, 3) #intercept, group, and level coefs
z <- cfs %*% t(mm)
plogis(z)
}
ymiss <- addMissing(y, X, fun=fun)
tail(cbind(ymiss, X), 10)
## missing values when y elements are large (i.e., a type of MNAR)
fun <- function(y) ifelse(abs(y) > 1, .4, 0)
ymiss <- addMissing(y, fun=fun)
tail(cbind(y, ymiss), 10)
## End(Not run)